globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

Get taxon ID in target taxonomy #22

Closed nleguillarme closed 4 years ago

nleguillarme commented 4 years ago

Hi @jhpoelen.

My use case requires converting a taxon ID from a source taxonomy (let's say EOL) to a specific target taxonomy (e.g. GBIF).

I don't think this is possible using nomer, right ? I guess the closer I can get is using globi-globalnames with the taxon name.

However, I know that this kind of mapping is done in GloBi, so I was wondering how you did that, and if it could be reused in nomer ?

Best regards.

jhpoelen commented 4 years ago

@nleguillarme thanks for your message!

My use case requires converting a taxon ID from a source taxonomy (let's say EOL) to a specific target taxonomy (e.g. GBIF). I don't think this is possible using nomer, right ?

Yes and no: GloBI uses the GloBI Taxon Graph to link taxon ids across taxonomies. By default, Nomer uses this taxon graph to configure the default matcher (i.e. globi-taxon-cache). And, when asked to link using a taxon id and/or name, the taxon id is attempted first.

So, when matching:

$ echo -e "NCBI:9606\tHomo sapiens" | nomer append 
NCBI:9606   Homo sapiens    SAME_AS NCBI:9606   Homo sapiens    species| Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens    NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606|  |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  |  |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species    http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS GBIF:2436436    Homo sapiens    speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens  GBIF:1 | GBIF:44 | GBIF:359 | GBIF:798 | GBIF:5483 | GBIF:2436435 | GBIF:2436436kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS IRMNG:10857762  Homo sapiens    speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens  IRMNG:11 | IRMNG:148 | IRMNG:1310 | IRMNG:11338 | IRMNG:104701 | IRMNG:1035772 | IRMNG:10857762 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS ITIS:180092 Homo sapiens    speciesAnimalia | Bilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Tetrapoda | Mammalia | Theria | Eutheria | Primates | Haplorrhini | Simiiformes | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens    ITIS:202423 | ITIS:914154 | ITIS:914156 | ITIS:158852 | ITIS:331030 | ITIS:914179 | ITIS:914181 | ITIS:179913 | ITIS:179916 | ITIS:179925 | ITIS:180089 | ITIS:943773 | ITIS:943778 | ITIS:943782 | ITIS:180090 | ITIS:943805 | ITIS:180091 | ITIS:180092   kingdom | subkingdom | infrakingdom | phylum | subphylum | infraphylum | superclass | class | subclass | infraclass | order | suborder | infraorder | superfamily | family | subfamily | genus | species    http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS NCBI:741158 Homo sapiens    subspecies      | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens    NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | NCBI:741158  |  |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  |  |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies   http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS OTT:770315  Homo sapiens    species|  | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens   OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315    |  | domain |  |  | kingdom |  |  |  | phylum | subphylum | subphylum | superclass |  |  | class |  | superclass |  | class | subclass |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species  http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS OTT:933436  Homo sapiens    subspecies      |  | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens   OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | OTT:933436   |  | domain |  |  | kingdom |  |  |  | phylum | subphylum | subphylum | superclass |  |  | class |  | superclass |  | class | subclass |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955 
NCBI:9606   Homo sapiens    SAME_AS WD:Q15978631    Homo sapiens    species| Homo sapiens   WD:Q171283 | WD:Q15978631   | species   https://www.wikidata.org/wiki/Q15978631 

because Nomer matches by id first, so the same results are presented when running:

$ echo -e "NCBI:9606\tDonald duck" | java -jar nomer.jar append
using default matcher [globi-taxon-cache]
NCBI:9606   Donald duck SAME_AS NCBI:9606   Homo sapiens    species| Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens    NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606|  |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  |  |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species    http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS GBIF:2436436    Homo sapiens    speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens  GBIF:1 | GBIF:44 | GBIF:359 | GBIF:798 | GBIF:5483 | GBIF:2436435 | GBIF:2436436kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS IRMNG:10857762  Homo sapiens    speciesAnimalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens  IRMNG:11 | IRMNG:148 | IRMNG:1310 | IRMNG:11338 | IRMNG:104701 | IRMNG:1035772 | IRMNG:10857762 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS ITIS:180092 Homo sapiens    speciesAnimalia | Bilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Tetrapoda | Mammalia | Theria | Eutheria | Primates | Haplorrhini | Simiiformes | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens    ITIS:202423 | ITIS:914154 | ITIS:914156 | ITIS:158852 | ITIS:331030 | ITIS:914179 | ITIS:914181 | ITIS:179913 | ITIS:179916 | ITIS:179925 | ITIS:180089 | ITIS:943773 | ITIS:943778 | ITIS:943782 | ITIS:180090 | ITIS:943805 | ITIS:180091 | ITIS:180092   kingdom | subkingdom | infrakingdom | phylum | subphylum | infraphylum | superclass | class | subclass | infraclass | order | suborder | infraorder | superfamily | family | subfamily | genus | species    http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS NCBI:741158 Homo sapiens    subspecies      | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens    NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | NCBI:741158  |  |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  |  |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies   http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS OTT:770315  Homo sapiens    species|  | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens   OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315    |  | domain |  |  | kingdom |  |  |  | phylum | subphylum | subphylum | superclass |  |  | class |  | superclass |  | class | subclass |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species  http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS OTT:933436  Homo sapiens    subspecies      |  | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens   OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | OTT:933436   |  | domain |  |  | kingdom |  |  |  | phylum | subphylum | subphylum | superclass |  |  | class |  | superclass |  | class | subclass |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955 
NCBI:9606   Donald duck SAME_AS WD:Q15978631    Homo sapiens    species| Homo sapiens   WD:Q171283 | WD:Q15978631   | species   https://www.wikidata.org/wiki/Q15978631 

or, when using:

$ echo -e "NCBI:9606" | java -jar nomer.jar append
using default matcher [globi-taxon-cache]
NCBI:9606   SAME_AS NCBI:9606   Homo sapiens    species     | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens   NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606    |  |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  |  |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species    http://eol.org/pages/327955 
NCBI:9606   SAME_AS GBIF:2436436    Homo sapiens    species     Animalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens GBIF:1 | GBIF:44 | GBIF:359 | GBIF:798 | GBIF:5483 | GBIF:2436435 | GBIF:2436436    kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955 
NCBI:9606   SAME_AS IRMNG:10857762  Homo sapiens    species     Animalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens IRMNG:11 | IRMNG:148 | IRMNG:1310 | IRMNG:11338 | IRMNG:104701 | IRMNG:1035772 | IRMNG:10857762 kingdom | phylum | class | order | family | genus | species http://eol.org/pages/327955 
NCBI:9606   SAME_AS ITIS:180092 Homo sapiens    species     Animalia | Bilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Tetrapoda | Mammalia | Theria | Eutheria | Primates | Haplorrhini | Simiiformes | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens   ITIS:202423 | ITIS:914154 | ITIS:914156 | ITIS:158852 | ITIS:331030 | ITIS:914179 | ITIS:914181 | ITIS:179913 | ITIS:179916 | ITIS:179925 | ITIS:180089 | ITIS:943773 | ITIS:943778 | ITIS:943782 | ITIS:180090 | ITIS:943805 | ITIS:180091 | ITIS:180092   kingdom | subkingdom | infrakingdom | phylum | subphylum | infraphylum | superclass | class | subclass | infraclass | order | suborder | infraorder | superfamily | family | subfamily | genus | species    http://eol.org/pages/327955 
NCBI:9606   SAME_AS NCBI:741158 Homo sapiens    subspecies      | Eukaryota | Opisthokonta | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens    NCBI:131567 | NCBI:2759 | NCBI:33154 | NCBI:33208 | NCBI:6072 | NCBI:33213 | NCBI:33511 | NCBI:7711 | NCBI:89593 | NCBI:7742 | NCBI:7776 | NCBI:117570 | NCBI:117571 | NCBI:8287 | NCBI:1338369 | NCBI:32523 | NCBI:32524 | NCBI:40674 | NCBI:32525 | NCBI:9347 | NCBI:1437010 | NCBI:314146 | NCBI:9443 | NCBI:376913 | NCBI:314293 | NCBI:9526 | NCBI:314295 | NCBI:9604 | NCBI:207598 | NCBI:9605 | NCBI:9606 | NCBI:741158  |  |  | kingdom |  |  |  | phylum | subphylum |  |  |  |  |  |  |  |  | class |  |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies   http://eol.org/pages/327955 
NCBI:9606   SAME_AS OTT:770315  Homo sapiens    species     |  | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens  OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315    |  | domain |  |  | kingdom |  |  |  | phylum | subphylum | subphylum | superclass |  |  | class |  | superclass |  | class | subclass |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species  http://eol.org/pages/327955 
NCBI:9606   SAME_AS OTT:933436  Homo sapiens    subspecies      |  | Eukaryota | Opisthokonta | Holozoa | Metazoa | Eumetazoa | Bilateria | Deuterostomia | Chordata | Craniata | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Dipnotetrapodomorpha | Tetrapoda | Amniota | Mammalia | Theria | Eutheria | Boreoeutheria | Euarchontoglires | Primates | Haplorrhini | Simiiformes | Catarrhini | Hominoidea | Hominidae | Homininae | Homo | Homo sapiens | Homo sapiens   OTT:805080 | OTT:93302 | OTT:304358 | OTT:332573 | OTT:5246131 | OTT:691846 | OTT:641038 | OTT:117569 | OTT:147604 | OTT:125642 | OTT:947318 | OTT:801601 | OTT:278114 | OTT:114656 | OTT:114654 | OTT:458402 | OTT:4940726 | OTT:229562 | OTT:229560 | OTT:244265 | OTT:229558 | OTT:683263 | OTT:5334778 | OTT:392222 | OTT:913935 | OTT:702152 | OTT:386195 | OTT:842867 | OTT:386191 | OTT:770311 | OTT:312031 | OTT:770309 | OTT:770315 | OTT:933436   |  | domain |  |  | kingdom |  |  |  | phylum | subphylum | subphylum | superclass |  |  | class |  | superclass |  | class | subclass |  |  | superorder | order | suborder | infraorder | parvorder | superfamily | family | subfamily | genus | species | subspecies http://eol.org/pages/327955 
NCBI:9606   SAME_AS WD:Q15978631    Homo sapiens    species     | Homo sapiens  WD:Q171283 | WD:Q15978631   | species   https://www.wikidata.org/wiki/Q15978631 

While GloBI Taxon Graph is updated every once in a while (see https://doi.org/10.5281/zenodo.755513), it does not aim to a complete mapping of all taxon ids out there. Instead, only taxa encountered in GloBI indexed datasets.

What your use case makes me realize is that various other projects (e.g., Open Tree of Life Taxonomy, Wiki Data taxon page, EOL's dynamic hierachy) maintain a graph of related taxon ids. In fact, wiki data links were used to populate parts of the wikidata ids that exist in the GloBI Taxon Graph (for methods, see Thessen AE, Poelen JH, Collins M, Hammock J. 2018. 20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration. PeerJ Computer Science 4:e164 https://doi.org/10.7717/peerj-cs.164 ).

Would it help your use-case to introduce specific matchers that make these id-to-id graphs more easy to access?

For instance, I imagine a matcher wikidata:

$ echo -e "NCBI:9606" | nomer append wikidata
NCBI:9606   SAME_AS https://www.wikidata.org/wiki/Q15978631 ...
NCBI:9606   SAME_AS NCBI:9606  ...
NCBI:9606   SAME_AS ITIS:180092  ...
NCBI:9606   SAME_AS GBIF:2436436 ...
NCBI:9606   SAME_AS EOL:327955 ...

This matcher would query wikidata using provided NCBI taxon id, and retrieve the wikidata entity (e.g., Q15978631). In addition, all taxon ids across other taxononies would be included, as reported via https://www.wikidata.org/wiki/Q15978631 .

I imagine a second, offline-enabled version would be included that would use a published archive, instead of the (slow, instable) web service/ sparql endpoints.

Similar matchers can be provided for other projects that provide cross-taxonomy matches (e.g., Open Tree of Life Taxonomy, EOL's dynamic hierachy, NCBI Taxon Linkout etc.).

Thanks again for sharing your use case and let me know if you'd be interested in collaborating on adding more support for id-to-id matchers in Nomer.

nleguillarme commented 4 years ago

@jhpoelen thanks for your reply.

Would it help your use-case to introduce specific matchers that make these id-to-id graphs more easy to access?

This would be great !

Thanks again for sharing your use case and let me know if you'd be interested in collaborating on adding more support for id-to-id matchers in Nomer.

I will be happy to contribute to the best of my ability.

jhpoelen commented 4 years ago

@nleguillarme glad to hear your are willing to collaborate. I've started an integration to help extract wikidata taxon links (e.g., https://www.wikidata.org/wiki/Q53636) contains many links to other ids of Anura, an amphibians order). One thing I was wondering is how wikidata handles synonyms / unaccepted names (e.g., Arius felis is an unaccepted name of Ariopsis felis). Can you help figure that out?

nleguillarme commented 4 years ago

@jhpoelen it seems like there are two ways knowledge about synonyms is represented in wikidata : https://www.wikidata.org/wiki/Wikidata:WikiProject_Taxonomy/Tutorial#Taxon_synonym

Here is a link to an example query using both methods : https://w.wiki/aKX

Is it what you were looking for ?

jhpoelen commented 4 years ago

Very cool, yes that is what I was looking for.

Thanks for sharing the example to lookup reported synonyms for specific taxa (see copy below for ease of reading thread):

SELECT ?taxon (GROUP_CONCAT(DISTINCT(?synonym); separator = ", ") AS ?synonym_list) (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list)
WHERE 
{
  BIND(wd:Q156301 AS ?taxon)
  OPTIONAL { ?taxon wdt:P1420 ?synonym .}
  OPTIONAL { ?taxon skos:altLabel ?altLabel .}
}
group by ?taxon`

In following the example, I do notice that the definition of taxon synonym ( https://www.wikidata.org/wiki/Property:P1420 ) is "(incorrect) name listed as synonym of a taxon name". .

In following the example:

A claim is made that:

Caprifoliaceae Q156301 has (incorrect) synonym P1420 Valerianaceae Q156682.

However, it appears that the inverse claim is also made:

Valerianaceae Q156682. has (incorrect) synonym P1420 Caprifoliaceae Q156301

So, it appears that wiki data claims contradict each other.

@nleguillarme Just checking my understanding: Do you agree that a contradicting claim is made in above example?

nleguillarme commented 4 years ago

Yes you are right ! I also noticed that some synonyms (e.g. Morinaceae Q133064, Linnaeaceae Q134924) are both instance of taxon Q16521 and instance of 'synonym of Caprifoliaceae'

jhpoelen commented 4 years ago

@nleguillarme thanks for confirming.

I guess any taxon id mapping scheme is expected to have mistakes, including the taxon synonym claims in wikidata. However, I wonder whether the claims we found are a difference in opinion (e.g., taxonomist A claims that X is a synonym of Y, taxonomist B claims that Y is a synonym of X), or the result of some faulty wikidata bot. And this make me wonder: how does wikidata deal with conflicting expert opinion? Does the person with the biggest wikidata bot win? Or is there some way to capture and report disputes? I noticed a discussion https://github.com/Wikidata/soweego/issues/220 in wikidata project by @marfox @fracorco and @Remper that may be relevant to annotating the confidence / quality of a certain claim.

Regardless, I'd like to propose to start with supporting in Nomer for a id-to-id mapping via wikidata that does not include synonym resolution yet. We can always add this later. Are you ok with that? If y, what mapping scheme did you have in mind?

jhpoelen commented 4 years ago

perhaps @qgroom knows about conflict resolution / suspicious wikidata claims - I believe he went to some workshops with wikidata folks.

jhpoelen commented 4 years ago

Hey @nleguillarme - I just added a first pass of the wikidata id matcher to Nomer v0.1.15 .

Now, you can use wikidata to map EOL ids to GBIF ids (or any other supported taxonomy).

Example:

$ echo  "EOL:327955" | nomer append wikidata-taxon-id-web
using matcher [wikidata-taxon-id-web]
EOL:327955  SAME_AS WD:Q15978631    Homo sapiens            Homo sapiens            https://www.wikidata.org/wiki/Q15978631 
EOL:327955  SAME_AS NCBI:9606   Homo sapiens            Homo sapiens            https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606    
EOL:327955  SAME_AS ITIS:180092 Homo sapiens            Homo sapiens            http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=180092    
EOL:327955  SAME_AS EOL:327955  Homo sapiens            Homo sapiens            http://eol.org/pages/327955 
EOL:327955  SAME_AS GBIF:2436436    Homo sapiens            Homo sapiens            http://www.gbif.org/species/2436436 
EOL:327955  SAME_AS MSW:12100795    Homo sapiens            Homo sapiens                
EOL:327955  SAME_AS INAT_TAXON:43584    Homo sapiens            Homo sapiens            https://inaturalist.org/taxa/43584  
EOL:327955  SAME_AS NBN:NHMSYS0000376773    Homo sapiens            Homo sapiens            https://data.nbn.org.uk/Taxa/NHMSYS0000376773   
EOL:327955  SAME_AS IRMNG:10857762  Homo sapiens            Homo sapiens            http://www.marine.csiro.au/mirrorsearch/ir_search.list_species?sp_id=10857762   

if you only want GBIF:

$ echo  "EOL:327955" | nomer append wikidata-taxon-id-web | grep GBIF
using matcher [wikidata-taxon-id-web]
EOL:327955  SAME_AS GBIF:2436436    Homo sapiens            Homo sapiens            http://www.gbif.org/species/2436436 

Also, you can go the other way:

$ echo  "GBIF:2436436" | nomer append wikidata-taxon-id-web | grep EOL
using matcher [wikidata-taxon-id-web]
GBIF:2436436    SAME_AS EOL:327955  Homo sapiens            Homo sapiens            http://eol.org/pages/327955 

@nleguillarme If this functionality help you with the EOL -> GBIF mapping, please close this issue. Otherwise, please suggest improvement / comments.

PS Synonym resolution is not yet supported, but I'd be happy to add it if you have some need for it and would like to help. Also, the current version uses individual sparql queries using a remote service. For more performant service, a offline-enabled matcher can be built using a published archive like the one published in:

Poelen, Jorrit. (2018). 20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements (Version 0.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1213477 .

jhpoelen commented 4 years ago

fyi @jhammock @seltmann - Nomer now supports mapping EOL page ids to many taxonomies via wikidata.

qgroom commented 4 years ago

perhaps @qgroom knows about conflict resolution / suspicious wikidata claims - I believe he went to some workshops with wikidata folks.

Wikidata is rather conflicted on biological taxonomy. It conflates scientific name and taxon data. There is no clear resolution to this, it is ultimately a problem with taxonomy that there is no single authority. It is effectively impossible to to create a list of taxa without choosing which authorities to follow.

jhpoelen commented 4 years ago

@qgroom thanks for sharing your take on wikidata. I can see how wikidata can provide a useful estimate for which taxa are related to each other (e.g., @nleguillarme 's use case of wanting to get the GBIF id for some EOL page id). And, when cross checking these taxon id relation estimates with other taxon graphs (e.g., Open Tree Taxonomy, EOL Dynamic Hierarchy, NCBI LinkOut) mapping inconsistencies can be detected and additional relations can be inferred (see @diatomsRcool 's paper https://doi.org/10.7717/peerj-cs.164 for some examples).