BLE-LTER / MetaEgress

R package to create Ecological Metadata Language documents from an instance of LTER-core-metabase database schema
https://BLE-LTER.github.io/MetaEgress/
6 stars 3 forks source link

Unexpected results and invalid EML with "expand_taxa=FALSE" in create_EML #80

Open gremau opened 1 year ago

gremau commented 1 year ago

I recently updated MetaEgress and updated my workflow to use the "expand_taxa" and "skip_taxa" arguments to the create_EML function. When I set "expand_taxa=TRUE" my taxa are expanded into a nice tree in the resulting EML. When I set "expand_taxa=FALSE" an invalid EML document is produced. No taxa expansion happens (as expected), but there are some elements in the resulting \<taxonomicClassification> element that won't validate (I think \<commonname> is the problem but not sure). One good thing about "expand_taxa=FALSE" is that there is a \<taxonId> element with the provider="https://itis.gov" attribute. This element does not appear with "expand_taxa=TRUE" as I was originally expecting.

It seems that "expand_taxa=FALSE" should still give valid EML with a taxonomicCoverage element, but I'm not sure where things are going wrong. Let me know if anyone has thoughts on how to correct this. 2 EML documents are attached (=FALSE and =TRUE cases)

knb-lter-jrn.210121001.62_expandfalse.xml.txt

knb-lter-jrn.210121001.62_expandtrue.xml.txt

twhiteaker commented 1 year ago

Did your parser tell you why the EML was invalid?

Without An, we've lost a lot of expertise to debug R stuff at BLE.

twhiteaker commented 1 year ago

One issue is that commonname should be commonName. https://github.com/NCEAS/eml/blob/main/xsd/eml-coverage.xsd#L1119

It's also weird that commonname appears in the expand=false version but not the expand=true version.

Another issue is that the rank name and value should occur before the common name and taxonId, I think, since this is an xs:sequence. https://github.com/NCEAS/eml/blob/main/xsd/eml-coverage.xsd#L1086

gremau commented 1 year ago

When validating the "expand_taxa=FALSE" file the first error I get is

"Error at line 217, column 23: no declaration found for element 'commonname'"

I then tried replacing commonname with commonName (camelcase) and got a different error:

"Error at line 221, column 35: element 'taxonRankName' is not allowed for content model '(taxonRankName?,taxonRankValue?,commonName,taxonId,taxonomicClassification*)'"

Bummer about An leaving BLE! She was great, and I hope we can keep MetaEgress and other projects going in her absence.

gremau commented 1 year ago

Good catch. At some point I can try to debug (might not be today though) - it seems like there is probably a logic issue and elements need to be reordered in the R code to satisfy the schema, but I'm not totally clear where to find that yet.

twhiteaker commented 1 year ago

Oh, it was also strange to me that commonName didn't show up in your expanded version. Common name does appear in the expanded versions that An generated for BLE.

twhiteaker commented 1 year ago

I think commonname in line 179 of assemble_taxonomy.R needs to be commonName. As for rank name and value being out of order, I don't know yet.

twhiteaker commented 1 year ago

In 583a746e3119aad1d017ab8b9c01d0abb09c8806, I fix the spelling to commonName. I exported EML with the expand_taxa option to set True and again set to False. Both outputs validated using this parser.

However, I don't see any common names in the output.

twhiteaker commented 1 year ago

For expand_taxa =FALSE, MetaEgress reads from Metabase. All my common names were empty. Once I entered a common name, it showed up in EML.

For expand_taxa =TRUE, I assume MetaEgress calls something like taxize to pull the info. WoRMS doesn't return common names as far as I can tell, and all of my datasets use WoRMS. @gremau do you have a dataset that uses some other provider that does provide common names, that you can test with?