Open gremau opened 1 year ago
Did your parser tell you why the EML was invalid?
Without An, we've lost a lot of expertise to debug R stuff at BLE.
One issue is that commonname should be commonName. https://github.com/NCEAS/eml/blob/main/xsd/eml-coverage.xsd#L1119
It's also weird that commonname appears in the expand=false version but not the expand=true version.
Another issue is that the rank name and value should occur before the common name and taxonId, I think, since this is an xs:sequence. https://github.com/NCEAS/eml/blob/main/xsd/eml-coverage.xsd#L1086
When validating the "expand_taxa=FALSE" file the first error I get is
"Error at line 217, column 23: no declaration found for element 'commonname'"
I then tried replacing commonname with commonName (camelcase) and got a different error:
"Error at line 221, column 35: element 'taxonRankName' is not allowed for content model '(taxonRankName?,taxonRankValue?,commonName,taxonId,taxonomicClassification*)'"
Bummer about An leaving BLE! She was great, and I hope we can keep MetaEgress and other projects going in her absence.
Good catch. At some point I can try to debug (might not be today though) - it seems like there is probably a logic issue and elements need to be reordered in the R code to satisfy the schema, but I'm not totally clear where to find that yet.
Oh, it was also strange to me that commonName didn't show up in your expanded version. Common name does appear in the expanded versions that An generated for BLE.
I think commonname in line 179 of assemble_taxonomy.R needs to be commonName. As for rank name and value being out of order, I don't know yet.
In 583a746e3119aad1d017ab8b9c01d0abb09c8806, I fix the spelling to commonName. I exported EML with the expand_taxa option to set True and again set to False. Both outputs validated using this parser.
However, I don't see any common names in the output.
For expand_taxa =FALSE
, MetaEgress reads from Metabase. All my common names were empty. Once I entered a common name, it showed up in EML.
For expand_taxa =TRUE
, I assume MetaEgress calls something like taxize to pull the info. WoRMS doesn't return common names as far as I can tell, and all of my datasets use WoRMS. @gremau do you have a dataset that uses some other provider that does provide common names, that you can test with?
I recently updated MetaEgress and updated my workflow to use the "expand_taxa" and "skip_taxa" arguments to the create_EML function. When I set "expand_taxa=TRUE" my taxa are expanded into a nice tree in the resulting EML. When I set "expand_taxa=FALSE" an invalid EML document is produced. No taxa expansion happens (as expected), but there are some elements in the resulting \<taxonomicClassification> element that won't validate (I think \<commonname> is the problem but not sure). One good thing about "expand_taxa=FALSE" is that there is a \<taxonId> element with the provider="https://itis.gov" attribute. This element does not appear with "expand_taxa=TRUE" as I was originally expecting.
It seems that "expand_taxa=FALSE" should still give valid EML with a taxonomicCoverage element, but I'm not sure where things are going wrong. Let me know if anyone has thoughts on how to correct this. 2 EML documents are attached (=FALSE and =TRUE cases)
knb-lter-jrn.210121001.62_expandfalse.xml.txt
knb-lter-jrn.210121001.62_expandtrue.xml.txt