globalbioticinteractions / globalbioticinteractions

Global Biotic Interactions provides access to existing species interaction datasets
https://globalbioticinteractions.org
GNU General Public License v3.0
124 stars 17 forks source link

detecting (visually) homonyms like "Ficus" #495

Open jhpoelen opened 4 years ago

jhpoelen commented 4 years ago

as @qgroom noted in a slack message (can't link b/c the slack isn't open ; ( ):

BTW: I spotted one little error. Ficus is both a plant and a gastropod. Is this a usecase for your "refute" template? (edited)

my reply:

@Quentin Groom Thanks for pointing this out. Please record the issue at https://github.com/globalbioticinteractions/globalbioticinteractions/issues/new with example. And yes, this is an excellent example in which you (as an expert) can refute claims like "flowers of gastropods are visited by bats" :slightly_smiling_face:. Homonyms are expected to cause a fuss (as usual) and detection method exist beyond spot checking. e.g., https://doi.org/10.7717/peerj-cs.164 .

I realized that many homonyms might be hard to spot / detect for humans, even though the name linkages help to easily detect them computationally.

In the case of Ficus, Nomer (see https://github.com/globalbioticinteractions/nomer), a globi name matching tool, nicely reports the homonyms when using $ echo -e "\tFicus" | nomer append like:

providedId providedName linktype resolvedId resolvedName resolvedRank resolvedCommonNames resolvedHierachy resolvedHierachyIds resolvedHierachyRanks
  Ficus SAME_AS ITIS:19081 Ficus genus   Plantae | Viridiplantae | Streptophyta | Embryophyta | Tracheophyta | Spermatophytina | Magnoliopsida | Rosanae | Rosales | Moraceae | Ficus ITIS:202422 | ITIS:954898 | ITIS:846494 | ITIS:954900 | ITIS:846496 | ITIS:846504 | ITIS:18063 | ITIS:846548 | ITIS:24057 | ITIS:19063 | ITIS:19081 kingdom | subkingdom | infrakingdom | superphylum | phylum | subphylum | class | superorder | order | family | genus
  Ficus SAME_AS ITIS:73159 Ficus genus   Animalia | Bilateria | Protostomia | Lophozoa | Mollusca | Gastropoda | Neotaenioglossa | Ficidae | Ficus ITIS:202423 | ITIS:914154 | ITIS:914155 | ITIS:914159 | ITIS:69458 | ITIS:69459 | ITIS:566851 | ITIS:73158 | ITIS:73159 kingdom | subkingdom | infrakingdom | superphylum | phylum | class | order | family | genus

However, when providing a taxonomic context like Plantae, via $ echo -e "\tPlantae | Ficus" a conflict no longer occurs.

To help more easily detect naming issues, I am thinking to label taxa that have inconsistent linkages (e.g., homonyms or other ambiguous links).

jhpoelen commented 4 years ago

Probably needless to say that someone hopefully comes up with better and reusable simple tool to provide scalable, performant offline-enabled taxonomic name matching to avoid having to re-invent the wheel.

qgroom commented 4 years ago

Rather than refuting the interaction would it be better if I update my list with ITIS name IDs? I suspect you're going to want both.

jhpoelen commented 4 years ago

:+1: I much like your idea to add ITIS ids (or other ids of taxonomic schemes that you prefer) to set a taxonomic context. This would help to you to explicitly point GloBI (or other users) to the taxa you'd like to include. For instance, using Nomer, the name match against ITIS:19081\tFicus :

$ echo -e "ITIS:19081\tFicus" | nomer append
ITIS:19081  Ficus   SAME_AS ITIS:19081  Ficus   genus       Plantae | Viridiplantae | Streptophyta | Embryophyta | Tracheophyta | Spermatophytina | Magnoliopsida | Rosanae | Rosales | Moraceae | Ficus    ITIS:202422 | ITIS:954898 | ITIS:846494 | ITIS:954900 | ITIS:846496 | ITIS:846504 | ITIS:18063 | ITIS:846548 | ITIS:24057 | ITIS:19063 | ITIS:19081 kingdom | subkingdom | infrakingdom | superphylum | phylum | subphylum | class | superorder | order | family | genus    http://eol.org/pages/60627  
$ 

And, time permitting, I suspect that your expert contributions of refuted interaction claims now would be a useful way to spot suspicious interactions for years to come.

fyi @jhammock @KatjaSchulz

qgroom commented 4 years ago

I started working on this. I primarily used ITIS, but where the name is not in ITIS I had to go elsewhere. What prefixes should I be using with other resources, such as the Catalogue of Life and Index Fungroum?

Also, if the paper referenced uses a synonym of, what is now, an accepted name, should I correct that name, give the correct ID to the accepted name or something else. For some reason many of these resources do not have identifiers for the synonyms.

jhpoelen commented 4 years ago

@qgroom you can find the prefixes that GloBI currently supports at https://api.globalbioticinteractions.org/prefixes (json) or https://api.globalbioticinteractions.org/prefixes?type=tsv (tsv) or https://api.globalbioticinteractions.org/prefixes?type=csv (csv) . Happy to add support for additional ones.

As far as the transcription goes - I'd personally leave the name of the original and leave the taxonomic interpretation of downstream system: the original won't change, but the taxonomic interpretation might.

jhpoelen commented 4 years ago

btw - @qgroom would you advise for GloBI to support Catalogue of Life ids?

qgroom commented 4 years ago

Re: CoL. I'm not sure. They used to use LSIDs, but these don't seem to be displayed now. However, the GUID is still the same in the URL. I'm inclined to think that they are as stable as any other system. These identifiers are available in GBIF too e.g. https://www.gbif.org/species/153643127/verbatim