Closed nleguillarme closed 3 years ago
hey @nleguillarme -
I have a lot of taxon names I'd like to match to the NCBI Taxonomy (because NCBI is actually the only taxonomy with an ontology representation : http://www.obofoundry.org/ontology/ncbitaxon.html)
Shouldn't be too hard to do similar things with other taxonomies, but I can see that it would be easy to reuse an existing resource.
However, it seems that Global Names Resolver is not able to resolve taxon names tagged as synonyms in NCBI.
Did you consider contacting the Global Names folks about this? (e.g., @dima)
For instance, Holosticha manca is not resolved as a synonym of Anteholosticha manca : https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=385028
So I think it would be interesting to have a matcher that directly interacts with the NCBI Taxonomy for name matching, similar to this python package : https://pypi.org/project/ncbi-taxonomist/
I can see how it would be nice to have a fast NCBI name matcher with offline support . A quick glance at the ncbi data, tell me that:
.../ncbi-taxa$ cat names.dmp | grep -E "[a-zA-Z]*[ ]+manca"
385028 | Anteholosticha manca (Kahl, 1932) Berger, 2003 | | authority |
385028 | Anteholosticha manca | | scientific name |
385028 | Holosticha manca Kahl, 1932 | | authority |
385028 | Holosticha manca | | synonym |
Because nomer already supports offline matching of ncbi taxa by id, support for matching by (exact) name / synonyms can also be added. Would you use that ?
Shouldn't be too hard to do similar things with other taxonomies, but I can see that it would be easy to reuse an existing resource.
I agree with you, and this is something I may consider in the future, e.g. exporting the GBIF Backbone taxonomy as an ontology.
Did you consider contacting the Global Names folks about this? (e.g., @dima)
Well I checked the GitHub repo of Global Name Resolver : the last commit is 4 years ago, so I was wondering if the project is still alive...
Because nomer already supports offline matching of ncbi taxa by id, support for matching by (exact) name / synonyms can also be added. Would you use that ?
I would absolutely use that !
@nleguillarme I've implemented a first version of offline-enable id/name/synonym matching against ncbi taxonomy.
$ echo -e "\tAriopsis felis\n\tHolosticha manca" | nomer append ncbi-taxon
Ariopsis felis SAME_AS NCBI:75286 Ariopsis felis species root | cellular organisms | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Actinopterygii | Actinopteri | Neopterygii | Teleostei | Osteoglossocephalai | Clupeocephala | Otomorpha | Ostariophysi | Otophysi | Characiphysae | Siluriformes | Siluroidei | Ariidae | Ariopsis | Ariopsis felis NCBI:1 | NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:7898 | NCBI:186623 | NCBI:41665 | NCBI:32443 | NCBI:1489341 | NCBI:186625 | NCBI:186634 | NCBI:32519 | NCBI:186626 | NCBI:186628 | NCBI:7995 | NCBI:1489793 | NCBI:31017 | NCBI:243723 | NCBI:75286 | | superkingdom | clade | kingdom | clade | clade | clade | phylum | subphylum | clade | clade | clade | clade | superclass | class | subclass | infraclass | clade | | cohort | subcohort | clade | superorder | order | suborder | family | genus | species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=75286
Holosticha manca SYNONYM_OF NCBI:385028 Anteholosticha manca species root | cellular organisms | Eukaryota | Sar | Alveolata | Ciliophora | Intramacronucleata | Spirotrichea | Stichotrichia | Urostylida | Holostichidae | Anteholosticha | Anteholosticha manca NCBI:1 | NCBI:131567 | NCBI:2759 | NCBI:2698737 | NCBI:33630 | NCBI:5878 | NCBI:431838 | NCBI:33829 | NCBI:194286 | NCBI:486728 | NCBI:578128 | NCBI:584654 | NCBI:385028 | | superkingdom | clade | clade | phylum | subphylum | class | subclass | order | family | genus | species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=385028
Note that the first time, will be slow, because it'll download/index a new copy of the NCBI taxonomy as configured.
Also, if you have an existing ncbi cache, please use nomer clean
to clear our the old local index first.
I'll work on publishing a new release with this new matcher in it. Thanks for being patient.
The recently created Nomer release https://github.com/globalbioticinteractions/nomer/releases/tag/0.1.24 contains the first pass at the NCBI name/synonym you describe.
Curious to hear your comments on the new functionality.
It works perfectly.
Converting GBIF taxon to NCBI taxon is not trivial. I now make a first pass with wikidata-taxon-id-web, then try to match on names using globi-taxon-cache, then ncbi-taxon. The synonym information is useful to match a few more names.
Thank you for your help and your reactivity as always.
Hi @jhpoelen.
I have a lot of taxon names I'd like to match to the NCBI Taxonomy (because NCBI is actually the only taxonomy with an ontology representation : http://www.obofoundry.org/ontology/ncbitaxon.html)
One way to do that is to use Global Names Resolver. However, it seems that Global Names Resolver is not able to resolve taxon names tagged as synonyms in NCBI.
For instance, Holosticha manca is not resolved as a synonym of Anteholosticha manca : https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=385028
So I think it would be interesting to have a matcher that directly interacts with the NCBI Taxonomy for name matching, similar to this python package : https://pypi.org/project/ncbi-taxonomist/