globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

support offline NCBI taxon matching #17

Closed jhpoelen closed 4 years ago

jhpoelen commented 4 years ago

related to #11 .

NCBI Taxonomy is provided in batch via https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/ . Suggest to replace (slow, network dependent) NCBI webservice calls with locally cached matching to make matching faster and more reliable..

jhpoelen commented 4 years ago

First pass for offline NCBI taxon lookup supported in nomer v0.1.10

jhpoelen commented 4 years ago

usage:

$ echo "NCBI:9606" | nomer append ncbi-taxon-id
using matcher [ncbi-taxon-id]
NCBI taxonomy already indexed at [xxx], no need to import.
NCBI:9606   SAME_AS NCBI:9606   Homo sapiens    species     root | cellular organisms | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens NCBI:1 | NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606   |  | superkingdom |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  | superclass |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species   https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606