Open balhoff opened 1 year ago
This situation is slightly worst in Babel 2023jun29, where UMLS:C0326959
is entirely missing, because it's semantic type -- T012 -- is no longer mapped correctly into the Biolink model.
The NCBITaxon situation should be an easier fix: it looks like we're only importing "scientific name" and "synonym" (meaning taxonomic synonym, not alternate name) and ignoring "common name" and "genbank common name", which is where the common names live.
The list of possible name_class values we can use, as of the May 1 release of NCBITaxon (I think), is:
25 genbank acronym
230 blast name
667 in-part
2086 acronym
14641 common name
30328 genbank common name
56575 equivalent name
75081 includes
220185 type material
245827 synonym
670412 authority
2503930 scientific name
So we definitely want to add common name
and genbank common name
so that organism common names will work, and we might want to bring in equivalent name
and keep synonym
so we can keep synonyms (e.g. Pinus abies
is a synonym of the currently accepted name, Picea abies
, so we would expect both to potentially bring back the same taxonomic name). I will need to double-check the rest to make sure we don't need them. I am very surprised but pleased to see the 220,185 references to type material in here!
I would rather receive an NCBI taxonomy identifier in most cases. However, there are many species that aren't in NCBI, so some other source might be needed for those (GBIF or Catalog of Life?). One problem example: searching for "american goldfinch", I get this result:
However the taxonomically valid name for this species is "Spinus tristis" (a synonym here). "Carduelis tristis" is a taxonomic synonym. See https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=54773&lvl=3&lin=f&keep=1&srchmode=1&unlock and https://verifier.globalnames.org/?capitalize=on&format=html&names=Carduelis+tristis