EDIorg / EMLassemblyline

R package for creating EML metadata
https://ediorg.github.io/EMLassemblyline/
MIT License
28 stars 13 forks source link

template_taxonomic_coverage() : add user-custom taxa authorities #50

Open earnaud opened 4 years ago

earnaud commented 4 years ago

Hi EDI team,

is it possible (or easily feasible) to allow the uer to add a custom taxonomic authority? In my case, I would be interested to fetch taxonomic values from TaxRef.

Cheers.

clnsmth commented 4 years ago

Absolutely possible, but it will require some refactoring. To do this we could add TaxRef resolver function to template_taxonomic_coverage() and the associated taxa rank expander to make_eml(), but this would still restrict inputs from unsupported authorities.

An alternative implementation could be to format taxonomic_coverage.txt in a way that accommodates varying taxonomic ranks and corresponding annotations, which could be manually generated if necessary or automatically generated by template_taxonomic_coverage() if the authority is supported. In this case the rank expander would be included in template_taxonomic_coverage() not make_eml().

earnaud commented 4 years ago

I was more thinking about a function like new_authority() which would require the user to fulfill some (many?) fields and then allow him to use this authority as one listed in taxonomicCleanr::view_authority(). I will try to make such a function according to the use of authorities in template_taxonomic_coverage() and make_eml()

juddpatterson commented 3 years ago

I was curious if I've missed any enhancements as mentioned here that could support custom taxonomic authorities. In a recent situation we use USGS BioData as our taxonomic authority, so we created a custom taxonomic_coverage.txt that mimics what would be created by supported authorities such as ITIS, GBIF, etc. While I think the package could provide a warning and still proceed, instead we see this message:

Taxonomic coverage (Required) - Taxonomic coverage will be dropped from the EML until these issues are fixed:

  1. Unsupported authorities for entries: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50. Supported authorities are: ITIS, World Register of Marine Species, GBIF Backbone Taxonomy, Tropicos - Missouri Botanical Garden

I'd love to see a way for us to use a custom taxonomic authority. Thanks for all the great work!

clnsmth commented 3 years ago

Hi @juddpatterson, thanks for this inquiry (and reminder of this issue)! You haven't missed anything regarding support for other authority systems but there has been a refactoring of how EAL reports input issues in version 3.0.0, which you're now encountering.

You're right, unsupported authorities should pass "as is" from the template to the taxonomicCoverage EML element. I will elevate the priority of this issue and should have the feature implemented early next week. Supporting "other" authorities in the capacity we do for ITIS, WORMS, etc. will take more time. Thanks for your patience and interest.

clnsmth commented 3 years ago

Hi @juddpatterson. An enhancement addressing your use case (better handling of unsupported taxonomic authorities) has been released to the master branch (version 3.1.0). Please let me know if you encounter any issues.

juddpatterson commented 3 years ago

Thanks @clnsmth, my office tried it yesterday and succeeded in creating an EML/XML with our unsupported taxonomic authority. In the EML output the taxaRankName defaults to 'unknown' (as expected), which works for our purposes at this point. Thanks!

One thing I noticed is that while our custom taxonomic_coverage.txt file has values for the 'authority_system' field, those don't make it into the final EML. I'm just learning the EML structure, but what to do you think of filling in the EML 'taxonID provider' with the value from 'authority_system' in taxonomic_coverage.txt? Ultimately we may be better off describing and linking to our taxonomy in a more verbose description field, but I just wanted to toss the idea around. I've provided a small example below. Thanks again for the quick help!

Current Output

<taxonomicClassification>
   <taxonRankName>unknown</taxonRankName>
   <taxonRankValue>Eunotia rhynchocephala</taxonRankValue>
</taxonomicClassification>

Potential Output

<taxonomicClassification>
   <taxonRankName>unknown</taxonRankName>
   <taxonRankValue>Eunotia rhynchocephala</taxonRankValue>
   <taxonId provider="USGS BioData"/>
</taxonomicClassification>
clnsmth commented 3 years ago

You're a mind reader @juddpatterson! This is what I was going for in the 3.1.0 implementation! : )

It appears the issue is: 1.) The install didn't load successfully. I sometimes find a reboot of the R session to be required when installing from GitHub. 2.) The updated dependency (taxonomyCleanr 1.5.0) did not install with EAL 3.1.0. Usually a warning message is displayed when a dependencies version is outdated.

I'm happy to meet on zoom to resolve the issue if it persists.

juddpatterson commented 3 years ago

Thanks so much @clnsmth! I was able to get back to this today, and it worked as expected. E.g:

<taxonomicClassification>
   <taxonRankValue>Achnanthes coarctata (Brébisson ex W. Smith) Grunow</taxonRankValue>
   <taxonId provider="USGS BioData Algae Taxonomy v18.1"/>
</taxonomicClassification>

I believe the issue was with the taxonomyCleanr dependency. After manually updating that to 1.5.0 and rerunning the script, the output included the unsupported taxonId provider. Thanks so much for adding this pathway that allows us to use a custom taxonomic authority.

clnsmth commented 3 years ago

Happy to help @juddpatterson, and thanks for the feature request!