Closed whitfarnum closed 6 months ago
@whitfarnum thanks for your suggestion, I much like your idea. Adding authorities for the higher order taxa should take me about 2-4 hours to implement (pessimistic estimate), including preparing a release and testing etc. Any other (java) developer would be able to do this also, perhaps with a little bit of learning curve. Any idea on how to materialize your neat proposed feature?
@jhpoelen I was implementing via a hack of submitting species names to get the current higher taxonomy. I then isolated the higher taxa and just submitted those to Nomer. When I submit the higher taxa I get the authorities. I then stitched them together via dictionaries. everything I do is in Python.
Ok, neat to see that you are being creative and sharing ideas to improve nomer. I can see how adding this nomer feature would save you time.
Do you guys have a budget to support development of open source tools like Nomer? If not, I suggest you look into that, because I can't sustain working pro bono especially when working for fancy institutions like yours. If so, please let me know how you'd like to compensate for my time.
After about four hours since you first shared your idea, I was able to come up with the following (working) example with about 2-3 hours of development/testing/deployment:
echo -e "\tHomo sapiens"\
| nomer append --include-header ncbi\
| mlr --itsvlite --oxtab cat
produced the data below (note the populated resolvedPathAuthorships).
@whitfarnum is this what you had in mind?
providedExternalId
providedName Homo sapiens
relationName SAME_AS
resolvedExternalId NCBI:9606
resolvedName Homo sapiens
resolvedAuthorship Linnaeus, 1758
resolvedRank species
resolvedCommonNames
resolvedPath root | cellular organisms | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens
resolvedPathIds NCBI:1 | NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606
resolvedPathNames | | superkingdom | clade | kingdom | clade | clade | clade | phylum | subphylum | clade | clade | clade | clade | superclass | clade | clade | clade | class | clade | clade | clade | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species
resolvedPathAuthorships | | | Cavalier-Smith 1987 | | | | | | | Cuvier, 1812 | | | | | | | | | Parker & Haswell, 1897 | | | | Linnaeus, 1758 | | | | | Gray, 1825 | | Linnaeus, 1758 | Linnaeus, 1758
resolvedExternalUrl https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606
@jhpoelen yes that is the information I had in mind.
working pro bono especially when working for fancy institutions like yours. If so, please let me know how you'd like to compensate for my time.
I will bring this subject up with my supervisor.
Just curious - which Nomer "matcher" or taxonomic resource do you typically use?
Here's an example of associated ITIS results
providedExternalId
providedName Adoretus
relationName HAS_ACCEPTED_NAME
resolvedExternalId ITIS:187484
resolvedName Adoretus
resolvedAuthorship Dejean, 1833
resolvedRank genus
resolvedCommonNames
resolvedPath Animalia | Bilateria | Protostomia | Ecdysozoa | Arthropoda | Hexapoda | Insecta | Pterygota | Neoptera | Holometabola | Coleoptera | Polyphaga | Scarabeiformia | Scarabaeoidea | Scarabaeidae | Rutelinae | Adoretini | Adoretus
resolvedPathIds ITIS:202423 | ITIS:914154 | ITIS:914155 | ITIS:914158 | ITIS:82696 | ITIS:563886 | ITIS:99208 | ITIS:100500 | ITIS:563890 | ITIS:914213 | ITIS:109216 | ITIS:112747 | ITIS:678302 | ITIS:114486 | ITIS:114493 | ITIS:678509 | ITIS:926256 | ITIS:187484
resolvedPathNames kingdom | subkingdom | infrakingdom | superphylum | phylum | subphylum | class | subclass | infraclass | superorder | order | suborder | infraorder | superfamily | family | subfamily | tribe | genus
resolvedPathAuthorships ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | ITIS:AUTHORSHIP:0 | Linnaeus, 1758 | Emery, 1886 | Crowson, 1960 | Latreille, 1802 | Latreille, 1802 | MacLeay, 1819 | Burmeister, 1844 | Dejean, 1833
resolvedExternalUrl http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=187484
Currently we use Catalog of Life because they have the scarab database. I am currently curating and inventorying our scarabs. Once we are done with scarabs the plan is Carabidae since COL also has the definitive Carabidae catalog at the moment. We are pretty much prioritizing our work to focus on groups that have good online resources so we can leverage that research. I am considering writing up our uses of Nomer as a curation tool if I ever have time. I will contact you about authorship if it happens. I know this is not your envisioned use case but it has been a huge time saver. It has easily taken months of this process.
It has easily taken months of this process.
That is great to hear that Nomer saved you quite some time. I have to say that Nomer is what it is today because of folks like yourself - not shy to try out a new methods and open to sharing ideas for improvement.
(am re-building Catalogue of Life index with the latest dev version of Nomer as we speak, stay tuned . . . )
@whitfarnum here's the recently built Catalogue of Life results. Please note that the Catalogue of Life version is the one whose origin and content is packaged in the "Nomer Corpus of Taxonomic Resources" [1]. Is this result as you expected?
echo -e "\tAdoretus"\
| nomer append --include-header col\
| mlr --itsvlite --oxtab cat
yielded -
providedExternalId
providedName Adoretus
relationName HAS_ACCEPTED_NAME
resolvedExternalId COL:PCX
resolvedName Adoretus
resolvedAuthorship Dejean, 1833
resolvedRank genus
resolvedCommonNames
resolvedPath Biota | Animalia | Arthropoda | Insecta | Coleoptera | Scarabaeoidea | Scarabaeidae | Rutelinae | Adoretini | Adoretina | Adoretus
resolvedPathIds COL:5T6MX | COL:N | COL:RT | COL:H6 | COL:C2L | COL:SC | COL:6278C | COL:K9Y | COL:KJT | COL:LBJ | COL:PCX
resolvedPathNames unranked | kingdom | phylum | class | order | superfamily | family | subfamily | tribe | subtribe | genus
resolvedPathAuthorships | | | | | | Latreille, 1802 | MacLeay, 1819 | Burmeister, 1844 | Burmeister, 1844 | Dejean, 1833
resolvedExternalUrl https://www.catalogueoflife.org/data/taxon/PCX
[1] Poelen, J. H. (ed . ) . (2024). Nomer Corpus of Taxonomic Resources hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e hash://md5/70645090fdecba640b50577e2a6f2342 (0.23) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10810821
hey @whitfarnum -
I've released Nomer v0.5.8 with support for the authorship by rank fields.
Please find attached alignment-report.zip as well as the first entry in the report, expressed in xtabs optimized for vertical viewing in the text box below.
Note the various authorship entries by rank -
alignedOrderName Coleoptera
alignedOrderId ITIS:109216
alignedOrderAuthorship Linnaeus, 1758
alignedFamilyName Scarabaeidae
alignedFamilyId ITIS:114493
alignedFamilyAuthorship Latreille, 1802
Including release, testing, communication etc. this improvement took about 6 hours to complete. Now the big question is - what is the feature worth . . . curious to hear what your supervisor says about the benefit of having this feature/tool vs additional time spent when not having this feature/tool.
Please review and let me know if this implements your desired functionality.
providedExternalId
providedName Adoretus
parseRelation SAME_AS
parsedExternalId
parsedName Adoretus
parsedAuthority
parsedRank
parsedCommonNames
parsedPath
parsedPathIds
parsedPathNames
parsedPathAuthorships
parsedNameSource gbif-parse
parsedNameSourceUrl https://linker.bio,https://zenodo.org/records/10810821/files,https://zenodo.org/records/10045382/files,https://zenodo.org/records/10037817/files,https://zenodo.org/records/8327611/files
parsedNameSourceAccessedAt hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e
alignRelation HAS_ACCEPTED_NAME
alignedCatalogName itis
alignedExternalId http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=187484
alignedName Adoretus
alignedAuthorship Dejean, 1833
alignedRank genus
alignedCommonNames
alignedKingdomName Animalia
alignedKingdomId ITIS:202423
alignedKingdomAuthorship
alignedPhylumName Arthropoda
alignedPhylumId ITIS:82696
alignedPhylumAuthorship
alignedClassName Insecta
alignedClassId ITIS:99208
alignedClassAuthorship
alignedOrderName Coleoptera
alignedOrderId ITIS:109216
alignedOrderAuthorship Linnaeus, 1758
alignedFamilyName Scarabaeidae
alignedFamilyId ITIS:114493
alignedFamilyAuthorship Latreille, 1802
alignedSubfamilyName Rutelinae
alignedSubfamilyId ITIS:678509
alignedSubfamilyAuthorship MacLeay, 1819
alignedTribeName Adoretini
alignedTribeId ITIS:926256
alignedTribeAuthorship Burmeister, 1844
alignedSubtribeName
alignedSubtribeId
alignedSubtribeAuthorship
alignedGenusName Adoretus
alignedGenusId ITIS:187484
alignedGenusAuthorship Dejean, 1833
alignedSubgenusName
alignedSubgenusId
alignedSubgenusAuthorship
alignedSpeciesName
alignedSpeciesId
alignedSpeciesAuthorship
alignedSubspeciesName
alignedSubspeciesId
alignedSubspeciesAuthorship
alignedPath Animalia | Bilateria | Protostomia | Ecdysozoa | Arthropoda | Hexapoda | Insecta | Pterygota | Neoptera | Holometabola | Coleoptera | Polyphaga | Scarabeiformia | Scarabaeoidea | Scarabaeidae | Rutelinae | Adoretini | Adoretus
alignedPathIds ITIS:202423 | ITIS:914154 | ITIS:914155 | ITIS:914158 | ITIS:82696 | ITIS:563886 | ITIS:99208 | ITIS:100500 | ITIS:563890 | ITIS:914213 | ITIS:109216 | ITIS:112747 | ITIS:678302 | ITIS:114486 | ITIS:114493 | ITIS:678509 | ITIS:926256 | ITIS:187484
alignedPathNames kingdom | subkingdom | infrakingdom | superphylum | phylum | subphylum | class | subclass | infraclass | superorder | order | suborder | infraorder | superfamily | family | subfamily | tribe | genus
alignedPathAuthorships | | | | | | | | | | Linnaeus, 1758 | Emery, 1886 | Crowson, 1960 | Latreille, 1802 | Latreille, 1802 | MacLeay, 1819 | Burmeister, 1844 | Dejean, 1833
alignedNameSource itis
alignedNameSourceUrl https://linker.bio,https://zenodo.org/records/10810821/files,https://zenodo.org/records/10045382/files,https://zenodo.org/records/10037817/files,https://zenodo.org/records/8327611/files
alignedNameSourceAccessedAt hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e
@whitfarnum let me know when you got a chance to confirm that the newly release Nomer has the functionality you were hoping for.
Apologies for my banter on things related to funding of Nomer activities. Many folks have been generous in the past, and I am hoping to simply open the door to funding while keeping continuously improving our tools and keeping them openly accessible. Thanks for understanding my earlier opportunistic statements.
@whitfarnum let me if you have additional notes / questions, otherwise, I'll consider this issue closed.
I would like the results to included the alignedAuthority for higher taxonomy. It would allow me to add authors to our local datasets with less work. I currently have to do it in two passes where I query the higher taxa on there own to extract the author and year.
The current results are
alignedName: Adoretus alignedAuthority: Dejean, 1833 alignedSubfamilyName: Rutelinae alignedTribeName: Adoretini
I would like a fields like alignedSubfamilyNameAuthority: Smith, 1900 alignedTribeNameAuthority: Jones, 1901 this would mean I only need to run Nomer once and I can get all the information I need.