Closed nleguillarme closed 4 years ago
@nleguillarme thanks for your message!
My use case requires converting a taxon ID from a source taxonomy (let's say EOL) to a specific target taxonomy (e.g. GBIF). I don't think this is possible using nomer, right ?
Yes and no: GloBI uses the GloBI Taxon Graph to link taxon ids across taxonomies. By default, Nomer uses this taxon graph to configure the default matcher (i.e. globi-taxon-cache). And, when asked to link using a taxon id and/or name, the taxon id is attempted first.
So, when matching:
$ echo -e "NCBI:9606\tHomo sapiens" | nomer append
NCBI:9606 Homo sapiens SAME_AS NCBI:9606 Homo sapiens species| Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606| | | kingdom | | | | phylum | subphylum | | | | | | | | | class | | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS GBIF:2436436 Homo sapiens speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens GBIF:1 | GBIF:44 | GBIF:359 | GBIF:798 | GBIF:5483 | GBIF:2436435 | GBIF:2436436kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS IRMNG:10857762 Homo sapiens speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens IRMNG:11 | IRMNG:148 | IRMNG:1310 | IRMNG:11338 | IRMNG:104701 | IRMNG:1035772 | IRMNG:10857762 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS ITIS:180092 Homo sapiens speciesAnimalia | Bilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Tetrapoda | Mammalia | Theria | Eutheria | Primates | Haplorrhini | Simiiformes | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens ITIS:202423 | ITIS:914154 | ITIS:914156 | ITIS:158852 | ITIS:331030 | ITIS:914179 | ITIS:914181 | ITIS:179913 | ITIS:179916 | ITIS:179925 | ITIS:180089 | ITIS:943773 | ITIS:943778 | ITIS:943782 | ITIS:180090 | ITIS:943805 | ITIS:180091 | ITIS:180092 kingdom | subkingdom | infrakingdom | phylum | subphylum | infraphylum | superclass | class | subclass | infraclass | order | suborder | infraorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS NCBI:741158 Homo sapiens subspecies | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | NCBI:741158 | | | kingdom | | | | phylum | subphylum | | | | | | | | | class | | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS OTT:770315 Homo sapiens species| | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | | domain | | | kingdom | | | | phylum | subphylum | subphylum | superclass | | | class | | superclass | | class | subclass | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS OTT:933436 Homo sapiens subspecies | | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | OTT:933436 | | domain | | | kingdom | | | | phylum | subphylum | subphylum | superclass | | | class | | superclass | | class | subclass | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955
NCBI:9606 Homo sapiens SAME_AS WD:Q15978631 Homo sapiens species| Homo sapiens WD:Q171283 | WD:Q15978631 | species https://www.wikidata.org/wiki/Q15978631
because Nomer matches by id first, so the same results are presented when running:
$ echo -e "NCBI:9606\tDonald duck" | java -jar nomer.jar append
using default matcher [globi-taxon-cache]
NCBI:9606 Donald duck SAME_AS NCBI:9606 Homo sapiens species| Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606| | | kingdom | | | | phylum | subphylum | | | | | | | | | class | | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS GBIF:2436436 Homo sapiens speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens GBIF:1 | GBIF:44 | GBIF:359 | GBIF:798 | GBIF:5483 | GBIF:2436435 | GBIF:2436436kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS IRMNG:10857762 Homo sapiens speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens IRMNG:11 | IRMNG:148 | IRMNG:1310 | IRMNG:11338 | IRMNG:104701 | IRMNG:1035772 | IRMNG:10857762 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS ITIS:180092 Homo sapiens speciesAnimalia | Bilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Tetrapoda | Mammalia | Theria | Eutheria | Primates | Haplorrhini | Simiiformes | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens ITIS:202423 | ITIS:914154 | ITIS:914156 | ITIS:158852 | ITIS:331030 | ITIS:914179 | ITIS:914181 | ITIS:179913 | ITIS:179916 | ITIS:179925 | ITIS:180089 | ITIS:943773 | ITIS:943778 | ITIS:943782 | ITIS:180090 | ITIS:943805 | ITIS:180091 | ITIS:180092 kingdom | subkingdom | infrakingdom | phylum | subphylum | infraphylum | superclass | class | subclass | infraclass | order | suborder | infraorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS NCBI:741158 Homo sapiens subspecies | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | NCBI:741158 | | | kingdom | | | | phylum | subphylum | | | | | | | | | class | | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS OTT:770315 Homo sapiens species| | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | | domain | | | kingdom | | | | phylum | subphylum | subphylum | superclass | | | class | | superclass | | class | subclass | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS OTT:933436 Homo sapiens subspecies | | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | OTT:933436 | | domain | | | kingdom | | | | phylum | subphylum | subphylum | superclass | | | class | | superclass | | class | subclass | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955
NCBI:9606 Donald duck SAME_AS WD:Q15978631 Homo sapiens species| Homo sapiens WD:Q171283 | WD:Q15978631 | species https://www.wikidata.org/wiki/Q15978631
or, when using:
$ echo -e "NCBI:9606" | java -jar nomer.jar append
using default matcher [globi-taxon-cache]
NCBI:9606 SAME_AS NCBI:9606 Homo sapiens species | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | | | kingdom | | | | phylum | subphylum | | | | | | | | | class | | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 SAME_AS GBIF:2436436 Homo sapiens species Animalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens GBIF:1 | GBIF:44 | GBIF:359 | GBIF:798 | GBIF:5483 | GBIF:2436435 | GBIF:2436436 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955
NCBI:9606 SAME_AS IRMNG:10857762 Homo sapiens species Animalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens IRMNG:11 | IRMNG:148 | IRMNG:1310 | IRMNG:11338 | IRMNG:104701 | IRMNG:1035772 | IRMNG:10857762 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955
NCBI:9606 SAME_AS ITIS:180092 Homo sapiens species Animalia | Bilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Tetrapoda | Mammalia | Theria | Eutheria | Primates | Haplorrhini | Simiiformes | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens ITIS:202423 | ITIS:914154 | ITIS:914156 | ITIS:158852 | ITIS:331030 | ITIS:914179 | ITIS:914181 | ITIS:179913 | ITIS:179916 | ITIS:179925 | ITIS:180089 | ITIS:943773 | ITIS:943778 | ITIS:943782 | ITIS:180090 | ITIS:943805 | ITIS:180091 | ITIS:180092 kingdom | subkingdom | infrakingdom | phylum | subphylum | infraphylum | superclass | class | subclass | infraclass | order | suborder | infraorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 SAME_AS NCBI:741158 Homo sapiens subspecies | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | NCBI:741158 | | | kingdom | | | | phylum | subphylum | | | | | | | | | class | | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955
NCBI:9606 SAME_AS OTT:770315 Homo sapiens species | | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | | domain | | | kingdom | | | | phylum | subphylum | subphylum | superclass | | | class | | superclass | | class | subclass | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species http://eol.org/pages/327955
NCBI:9606 SAME_AS OTT:933436 Homo sapiens subspecies | | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | OTT:933436 | | domain | | | kingdom | | | | phylum | subphylum | subphylum | superclass | | | class | | superclass | | class | subclass | | | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955
NCBI:9606 SAME_AS WD:Q15978631 Homo sapiens species | Homo sapiens WD:Q171283 | WD:Q15978631 | species https://www.wikidata.org/wiki/Q15978631
While GloBI Taxon Graph is updated every once in a while (see https://doi.org/10.5281/zenodo.755513), it does not aim to a complete mapping of all taxon ids out there. Instead, only taxa encountered in GloBI indexed datasets.
What your use case makes me realize is that various other projects (e.g., Open Tree of Life Taxonomy, Wiki Data taxon page, EOL's dynamic hierachy) maintain a graph of related taxon ids. In fact, wiki data links were used to populate parts of the wikidata ids that exist in the GloBI Taxon Graph (for methods, see Thessen AE, Poelen JH, Collins M, Hammock J. 2018. 20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration. PeerJ Computer Science 4:e164 https://doi.org/10.7717/peerj-cs.164 ).
Would it help your use-case to introduce specific matchers that make these id-to-id graphs more easy to access?
For instance, I imagine a matcher wikidata
:
$ echo -e "NCBI:9606" | nomer append wikidata
NCBI:9606 SAME_AS https://www.wikidata.org/wiki/Q15978631 ...
NCBI:9606 SAME_AS NCBI:9606 ...
NCBI:9606 SAME_AS ITIS:180092 ...
NCBI:9606 SAME_AS GBIF:2436436 ...
NCBI:9606 SAME_AS EOL:327955 ...
This matcher would query wikidata using provided NCBI taxon id, and retrieve the wikidata entity (e.g., Q15978631). In addition, all taxon ids across other taxononies would be included, as reported via https://www.wikidata.org/wiki/Q15978631 .
I imagine a second, offline-enabled version would be included that would use a published archive, instead of the (slow, instable) web service/ sparql endpoints.
Similar matchers can be provided for other projects that provide cross-taxonomy matches (e.g., Open Tree of Life Taxonomy, EOL's dynamic hierachy, NCBI Taxon Linkout etc.).
Thanks again for sharing your use case and let me know if you'd be interested in collaborating on adding more support for id-to-id matchers in Nomer.
@jhpoelen thanks for your reply.
Would it help your use-case to introduce specific matchers that make these id-to-id graphs more easy to access?
This would be great !
Thanks again for sharing your use case and let me know if you'd be interested in collaborating on adding more support for id-to-id matchers in Nomer.
I will be happy to contribute to the best of my ability.
@nleguillarme glad to hear your are willing to collaborate. I've started an integration to help extract wikidata taxon links (e.g., https://www.wikidata.org/wiki/Q53636) contains many links to other ids of Anura, an amphibians order). One thing I was wondering is how wikidata handles synonyms / unaccepted names (e.g., Arius felis is an unaccepted name of Ariopsis felis). Can you help figure that out?
@jhpoelen it seems like there are two ways knowledge about synonyms is represented in wikidata : https://www.wikidata.org/wiki/Wikidata:WikiProject_Taxonomy/Tutorial#Taxon_synonym
Here is a link to an example query using both methods : https://w.wiki/aKX
Is it what you were looking for ?
Very cool, yes that is what I was looking for.
Thanks for sharing the example to lookup reported synonyms for specific taxa (see copy below for ease of reading thread):
SELECT ?taxon (GROUP_CONCAT(DISTINCT(?synonym); separator = ", ") AS ?synonym_list) (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list)
WHERE
{
BIND(wd:Q156301 AS ?taxon)
OPTIONAL { ?taxon wdt:P1420 ?synonym .}
OPTIONAL { ?taxon skos:altLabel ?altLabel .}
}
group by ?taxon`
In following the example, I do notice that the definition of taxon synonym ( https://www.wikidata.org/wiki/Property:P1420 ) is "(incorrect) name listed as synonym of a taxon name". .
In following the example:
A claim is made that:
Caprifoliaceae Q156301 has (incorrect) synonym P1420 Valerianaceae Q156682.
However, it appears that the inverse claim is also made:
Valerianaceae Q156682. has (incorrect) synonym P1420 Caprifoliaceae Q156301
So, it appears that wiki data claims contradict each other.
@nleguillarme Just checking my understanding: Do you agree that a contradicting claim is made in above example?
@nleguillarme thanks for confirming.
I guess any taxon id mapping scheme is expected to have mistakes, including the taxon synonym claims in wikidata. However, I wonder whether the claims we found are a difference in opinion (e.g., taxonomist A claims that X is a synonym of Y, taxonomist B claims that Y is a synonym of X), or the result of some faulty wikidata bot. And this make me wonder: how does wikidata deal with conflicting expert opinion? Does the person with the biggest wikidata bot win? Or is there some way to capture and report disputes? I noticed a discussion https://github.com/Wikidata/soweego/issues/220 in wikidata project by @marfox @fracorco and @Remper that may be relevant to annotating the confidence / quality of a certain claim.
Regardless, I'd like to propose to start with supporting in Nomer for a id-to-id mapping via wikidata that does not include synonym resolution yet. We can always add this later. Are you ok with that? If y, what mapping scheme did you have in mind?
perhaps @qgroom knows about conflict resolution / suspicious wikidata claims - I believe he went to some workshops with wikidata folks.
Hey @nleguillarme - I just added a first pass of the wikidata id matcher to Nomer v0.1.15 .
Now, you can use wikidata to map EOL ids to GBIF ids (or any other supported taxonomy).
Example:
$ echo "EOL:327955" | nomer append wikidata-taxon-id-web
using matcher [wikidata-taxon-id-web]
EOL:327955 SAME_AS WD:Q15978631 Homo sapiens Homo sapiens https://www.wikidata.org/wiki/Q15978631
EOL:327955 SAME_AS NCBI:9606 Homo sapiens Homo sapiens https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606
EOL:327955 SAME_AS ITIS:180092 Homo sapiens Homo sapiens http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=180092
EOL:327955 SAME_AS EOL:327955 Homo sapiens Homo sapiens http://eol.org/pages/327955
EOL:327955 SAME_AS GBIF:2436436 Homo sapiens Homo sapiens http://www.gbif.org/species/2436436
EOL:327955 SAME_AS MSW:12100795 Homo sapiens Homo sapiens
EOL:327955 SAME_AS INAT_TAXON:43584 Homo sapiens Homo sapiens https://inaturalist.org/taxa/43584
EOL:327955 SAME_AS NBN:NHMSYS0000376773 Homo sapiens Homo sapiens https://data.nbn.org.uk/Taxa/NHMSYS0000376773
EOL:327955 SAME_AS IRMNG:10857762 Homo sapiens Homo sapiens http://www.marine.csiro.au/mirrorsearch/ir_search.list_species?sp_id=10857762
if you only want GBIF:
$ echo "EOL:327955" | nomer append wikidata-taxon-id-web | grep GBIF
using matcher [wikidata-taxon-id-web]
EOL:327955 SAME_AS GBIF:2436436 Homo sapiens Homo sapiens http://www.gbif.org/species/2436436
Also, you can go the other way:
$ echo "GBIF:2436436" | nomer append wikidata-taxon-id-web | grep EOL
using matcher [wikidata-taxon-id-web]
GBIF:2436436 SAME_AS EOL:327955 Homo sapiens Homo sapiens http://eol.org/pages/327955
@nleguillarme If this functionality help you with the EOL -> GBIF mapping, please close this issue. Otherwise, please suggest improvement / comments.
PS Synonym resolution is not yet supported, but I'd be happy to add it if you have some need for it and would like to help. Also, the current version uses individual sparql queries using a remote service. For more performant service, a offline-enabled matcher can be built using a published archive like the one published in:
Poelen, Jorrit. (2018). 20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements (Version 0.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1213477 .
fyi @jhammock @seltmann - Nomer now supports mapping EOL page ids to many taxonomies via wikidata.
perhaps @qgroom knows about conflict resolution / suspicious wikidata claims - I believe he went to some workshops with wikidata folks.
Wikidata is rather conflicted on biological taxonomy. It conflates scientific name and taxon data. There is no clear resolution to this, it is ultimately a problem with taxonomy that there is no single authority. It is effectively impossible to to create a list of taxa without choosing which authorities to follow.
@qgroom thanks for sharing your take on wikidata. I can see how wikidata can provide a useful estimate for which taxa are related to each other (e.g., @nleguillarme 's use case of wanting to get the GBIF id for some EOL page id). And, when cross checking these taxon id relation estimates with other taxon graphs (e.g., Open Tree Taxonomy, EOL Dynamic Hierarchy, NCBI LinkOut) mapping inconsistencies can be detected and additional relations can be inferred (see @diatomsRcool 's paper https://doi.org/10.7717/peerj-cs.164 for some examples).
Hi @jhpoelen.
My use case requires converting a taxon ID from a source taxonomy (let's say EOL) to a specific target taxonomy (e.g. GBIF).
I don't think this is possible using nomer, right ? I guess the closer I can get is using globi-globalnames with the taxon name.
However, I know that this kind of mapping is done in GloBi, so I was wondering how you did that, and if it could be reused in nomer ?
Best regards.