Closed jhpoelen closed 4 months ago
The Nomer Corpus of Taxonomic Resources related to Nomer v0.5.10 (current version) is:
Poelen, J. H. (ed . ) . (2024). Nomer Corpus of Taxonomic Resources hash://sha256/3361f03229301a339b86779df0d74ed9ab564b1ef98dda4556ed0a0cafc28700 hash://md5/970d771ac2ff45e42a30b5cf88bf6a1b (0.25) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.12117955
and the most recent copy of NCBI taxonomy was captured on 2022-09-09T20:06:13.047Z with signature hash://sha256/30364d6dd82332e7da3aae6ce5c36a56de5e7d62f28c4490623f0c4cdd7875f6 via https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz because
preston ls --anchor hash://sha256/3361f03229301a339b86779df0d74ed9ab564b1ef98dda4556ed0a0cafc28700 --remote https://linker.bio,https://zenodo.org/records/12117955/files,https://zenodo.org/records/11105453/files/,https://zenodo.org/records/10045382/files/,https://zenodo.org/records/10037817/files/,https://zenodo.org/records/8327611/files/,https://zenodo.org/records/10044989/files/ | grep --before 10 "https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz" | grep -P "202[0-9]-[0-9]{2}-[0-9]{2}" | head -1
produced
<urn:uuid:6f2405cb-b26d-4043-8c9a-29bdccaee705> <http://www.w3.org/ns/prov#generatedAtTime> "2022-09-09T20:06:13.047Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:6f2405cb-b26d-4043-8c9a-29bdccaee705> .
with
<https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz> <http://purl.org/pav/hasVersion> <hash://sha256/30364d6dd82332e7da3aae6ce5c36a56de5e7d62f28c4490623f0c4cdd7875f6> <urn:uuid:6d92c3d3-5a7f-4597-b639-cee8995c1cea> .
inspecting the version of the ncbi taxonomic resource using
preston cat --anchor hash://sha256/3361f03229301a339b86779df0d74ed9ab564b1ef98dda4556ed0a0cafc28700 --remote https://linker.bio,https://zenodo.org/records/12117955/files,https://zenodo.org/records/11105453/files/,https://zenodo.org/records/10045382/files/,https://zenodo.org/records/10037817/files/,https://zenodo.org/records/8327611/files/,https://zenodo.org/records/10044989/files/ 'tar:gz:hash://sha256/30364d6dd82332e7da3aae6ce5c36a56de5e7d62f28c4490623f0c4cdd7875f6!/ncbi.ncbi!/names.dmp' | grep "Endoriftia persephone"
produced
393765 | "Candidatus Endoriftia persephone" Robidart et al. 2008 | | authority |
393765 | Candidatus Endoriftia persephone | | scientific name |
393765 | Endoriftia persephone | | equivalent name |
394104 | Candidatus Endoriftia persephone str. Hot96_1+Hot96_2 | | scientific name |
394104 | Endoriftia persephone 'Hot96_1+Hot96_2' | | equivalent name |
910259 | Candidatus Endoriftia persephone str. Guaymas | | scientific name |
910259 | Endoriftia persephone 'Guaymas' | | synonym |
910259 | Endoriftia persephone str. Guaymas | | synonym |
which indicates that the 2022 copy of ncbi did already have the equivalent relation in it.
After adding support for NCBI "equivalent to" relations, the following result was obtained using
echo -e "\tEndoriftia persephone"\
| nomer append --include-header ncbi\
| mlr --itsvlite --oxtab cat
yielding:
providedExternalId
providedName Endoriftia persephone
relationName SYNONYM_OF
resolvedExternalId NCBI:393765
resolvedName Candidatus Endoriftia persephone
resolvedAuthorship
resolvedRank species
resolvedCommonNames
resolvedPath root | cellular organisms | Bacteria | Proteobacteria | Gammaproteobacteria | Gammaproteobacteria incertae sedis | sulfur-oxidizing symbionts | Candidatus Endoriftia | Candidatus Endoriftia persephone
resolvedPathIds NCBI:1 | NCBI:131567 | NCBI:2 | NCBI:1224 | NCBI:1236 | NCBI:118884 | NCBI:32036 | NCBI:393764 | NCBI:393765
resolvedPathNames | | superkingdom | phylum | class | | clade | genus | species
resolvedPathAuthorships | | | [class] Stackebrandt et al. 1988 | Garrity et al. 2005 emend. Williams and Kelly 2013 | | | |
resolvedExternalUrl https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=393765
For now, the relation "equivalent to" is translated into "synonym of" until someone proposes a more suitable relation.
In https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=393765&lvl=3&lin=f&keep=1&srchmode=1&unlock
NCBI taxonomy reports equivalence between [Candidatus Endoriftia persephone] and [Endoriftia persephone] but Nomer's NCBI matcher does not via
yields
but . . .
unexpectedly reports no match.
related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/968 @kbseah