gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
28 stars 16 forks source link

help explaining taxonomic interpretation flags #5241

Closed CecSve closed 2 months ago

CecSve commented 3 months ago

This is an issue we received through help desk March 13th and concerns this dataset: https://www.gbif.org/dataset/9358fbd7-cfd0-4eab-99fa-0934396a0529.

The questions:

  1. It looks like I'm getting a "scientific name ID not found" on many AphiaIDs, even though the random smattering of IDs I just checked work when I search in WORMS. For example: https://www.gbif.org/occurrence/4537420633

  2. And specifically, it looks like the "scientific name ID not found" is causing an additional "taxon match higherrank" issue/flag on a smaller number of species (~18 it looks like). When I dig into the specific occurrences, e.g. this one for Bossiella mayae, I can confirm that the AphiaID, 1345662 is accepted on WORMS for species. But, it seems like whatever check GBIF is doing when the scientific name ID isn't found has a fuzzy match, or something, so it's getting defaulted erroneously up to genus?

  3. For an even smaller number of species, I'm getting a "taxon match name and ID ambiguous" error, even though again, when I search a couple random AphiaIDs, they work fine in WORMS.

  4. It looks like there are three species names that OBIS flags as not having an accepted match in WORMS: Hecatonema terminale, Ditylum brightwellii, and Skeletonema dohrnii. When I look at all three in WORMS, it's true: the species come up as "Status: uncertain > unassessed." So it seems like these should have been places where GBIF defaulted up to genus rather than accepting the species-level designation?

CecSve commented 3 months ago

@meghanmshea and @sformel-usgs there seems to be an issue with the WoRMS checklist in GBIF (and CoL) and we are investigating the issue. For 1-3 the species are currently not in the WoRMS checklist published to GBIF and for 4 the names are accepted in the backbone and therefore we match them. So the short story is that the WoRMS database, the WoRMS checklist published to GBIF (and GBIFs backbone) are not synched.

CecSve commented 3 months ago

The bug has been fixed in the next WoRMS release (April 1st 2024). The explanation: https://github.com/gbif/portal-feedback/issues/5239#issuecomment-2007280650

meghanmshea commented 3 months ago

Just wanted to check in on this: not sure how quickly the April 1st WoRMS release gets reflected in GBIF, but I'm not yet seeing any change in these taxonomic flags in the dataset.

ManonGros commented 3 months ago

Hi @meghanmshea the taxonomic flags will likely remain until the GBIF Backbone taxonomy is updated (with the new version of the WORMS checklist). It usually happens once to twice a year. Thanks for being patient.

sformel-usgs commented 2 months ago

Thanks for figuring this out!

CecSve commented 2 months ago

This issue is actually due to copyright restrictions in source datasets for WoRMS. It may not be the case for all records, but AlgaeBase is restricted and cannot be shared outside of WoRMS and this is why the IDs are not interpreted. Unfortunately, this is something we cannot change and I will close the issue.

mdoering commented 2 months ago

As GBIF will move to the extended Catalogue of Life for its backbone soon I have queried ChecklistBank to see which checklists actually treat Bossiella mayae so we can include them: https://www.checklistbank.org/namesindex/3348673/related

Not many, basically just NCBI and OpenTreeOfLife. Both of them not datasets we had planned to include in COL. It is really unfortunate that we cannot use AlgaeBase data.

A way out might be to use WikiSpecies which lists most species of https://species.wikimedia.org/wiki/Bossiella but not yet B. mayae. Which can then at least be manually added if needed to get into the next COL/GBIF taxonomy which we will release every month.