gbif / backbone-feedback

2 stars 0 forks source link

Biota interpreted as a plant #174

Open gbif-portal opened 1 year ago

gbif-portal commented 1 year ago

Biota interpreted as a plant

The OBIS community uses Biota (https://www.marinespecies.org/aphia.php?p=taxdetails&id=2) for identifications where the kingdom is unknown but it is known to be living such as from eDNA data. These are interpreted as an unaccepted synonym for genus Biota (D. Don) Endl. , and thus the terrestrial plant of genus Platycladus Spach.

Other datasets where this is happening: https://www.gbif.org/dataset/e24dcb47-89f9-4481-a8ac-c38ef26b2865 https://www.gbif.org/dataset/94054728-2522-48d0-a247-86fe1e600cfa https://www.gbif.org/dataset/69217c7b-5773-4015-a802-6af216b24c97

There are also some moths and fossils being interpreted to the plant https://www.gbif.org/occurrence/gallery?taxon_key=7326344


Github user: @albenson-usgs User: See in registry - Send email System: Chrome 108.0.0 / Windows 10.0.0 Referer: https://www.gbif.org/dataset/e0b59ee7-19ae-4eb0-9217-33317fb50d47 Window size: width 1275 - height 726 API log&_a=(columns:!(_source),filters:!(),index:'3390a910-fcda-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) Site log&_a=(columns:!(_source),filters:!(),index:'5c73f360-fce3-11ea-a9ab-4375f2a9d11c',interval:auto,query:(language:kuery,query:''),sort:!())) System health at time of feedback: OPERATIONAL

bart-v commented 1 year ago

If GBIF would just use the scientificNameID as provided by OBIS for close too 100% of it's records, this would not be an issue.... Has been raised before https://github.com/gbif/pipelines/issues/217

CecSve commented 1 year ago

The work around soultion would be for the publishers to provide taxonRank = Kingdom to the occurrences with scientificName = Biota. This is currently the only way GBIF can interpret the occurrences correctly and not assign the wrong taxonomy to the records. If Kingdom is provided as scientificName, then GBIF interprets it as incertae sedis (unknown), as Biota is not in the GBIF backbone, but the plant genus Biota is - which is why unfamiliar taxonRanks such as superdomain will be interpreted as the rank with a scientificName match.

Even though the kingdom is technically unknown for the occurrences, GBIF does not use intermediate ranks in the interpretation process.

mdoering commented 1 month ago

Sadly the Biota record in WoRMS is matched to the plant genus in the backbone. So whenever an occurrence claims taxonID=urn:lsid:marinespecies.org:taxname:1 it will become a plant now again with the new ID based matching, no matter if the rank equals kingdom or not.

https://api.gbif.org/v1/species/154607070
=> nubKey=7326344 https://api.gbif.org/v1/species/7326344

bart-v commented 1 month ago

So the problem is the backbone. Should be an easy fix.

mdoering commented 1 month ago

So the problem is the backbone. Should be an easy fix.

why is that?

bart-v commented 1 month ago

You change the backbone not to do this...

CecSve commented 1 month ago

Currently, this is the process for assigning names where the record contains a scientificNameID (from https://github.com/gbif/pipelines/issues/217):

  1. Detect that scientificNameID contains an identifier we've enabled in configuration based on the prefix of urn:lsid:marinespecies.org
  2. We'd look that up against the reference checklist (we'd configure that prefix to point to the WoRMs checklist) using this API call
  3. The response has the nubKey (the backbone key) which we'd then use to populate the names and necessary backbone identifiers for the record

So the problem is the backbone. Should be an easy fix.

Could the solution be to always interpret urn:lsid:marinespecies.org:taxname:1 as incertae sedis and not check with the backbone? As we are moving away from the backbone to assign names to occurrence records, I do not believe we will make such a fix in the backbone itself. It may be something that could be fixed in Catalogue of Life eventually?

mdoering commented 1 month ago

COl has Biota both as the root of all life and the plant genus, so matching should be fine there. Hardcoding some identifiers in our current pipelines is surely possible, although not nice. @djtfmartin maybe we should also cater for a manual config that overrides any matching results to "fix" things like this?

I still fail to see why the backbone is the problem. It is the matching routines primarily.