gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

mismatched IUCN category in occurrence records due to synonymy #4132

Open ahahn-gbif opened 2 years ago

ahahn-gbif commented 2 years ago

context: occurrence search; some IUCN Global Red List Category assignments to occurrence records are incorrect

example issue: Betula pubescens should be "Least Concern (LC)", but shows as "Critically Endangered (CR)" see: https://www.gbif.org/occurrence/search?taxon_key=9118014 (Betula pubescens) look at: expanding the "IUCN Global Red List Category" filter shows most of the records interpreted as "Critically Endangered (CR)" expecting to see: according to IUCN, Betula pubescens is Least Concern (LC) responding to: user feedback received from GBIF.ch via helpdesk

Testing the intermediate product (imported IUCN checklist that is used during ingestion of occurrence data when assigning the IUCN categories to occurrence records), the records concerned are

This introduces a second, stand-alone entry for Betula pubescens as "synonym of" in GBIF's representation of the list that is not present as such in the original at https://www.iucnredlist.org/search?query=Betula%20pubescens&searchType=species. Not knowing the data structure behind the source, the "Taxonomy in detail" for Betula klokovii appears to only name Betula pubescens as an unlinked text.

It appears that during ingestion, due to this synonymization and the fact that two identical entries for the name Betula pubescens exist, the IUCN category for Betula pubescens is applied from Betula klokovii, which lists the otherwise accepted name Betula pubescens Ehrh. as a synonym.

The critical point seems to be in the generation of two records for the same name during the list import, to express the debated synonym relationship documented in the source. Do we need to reconsider this approach if our intended application is not that of a taxonomic checklist, but rather a species information source?

Tagging @MattBlissett and @mdoering in case I am misinterpreting the situation

ManonGros commented 2 years ago

@ahahn-gbif we discussed the issue here too: https://github.com/gbif/pipelines/issues/495

ahahn-gbif commented 2 years ago

Alternatively, maybe the ingestion step would need to consider that there could be more than one match to a given name in The IUCN Red List of Threatened Species, and prefer an accepted name (with an assigned category) over any synonym?

ahahn-gbif commented 2 years ago

@ahahn-gbif we discussed the issue here too: https://github.com/gbif/pipelines/issues/495

Thanks!

mdoering commented 2 years ago

IUCN is the only source I can find that claims B. pubescens to be a synonym of B. klokovii. On their site is reads:

This species ... is sometimes considered to be an aberrant specimen of the Common Birch, Betula pubescens (A. Sennikov pers. comm. 2016).

I suspect this personal communication is the source of the synonymy.

The problematic response in our API is this one, 9118014 being the accepted and only presence of B. pubescens in the backbone: https://api.gbif.org/v1/species/9118014/iucnRedListCategory

This is because both of the IUCN records are matched to the only one backbone species, "nubKey":9118014: https://api.gbif.org/v1/species/176838806 https://api.gbif.org/v1/species/176838773

To avoid returning the wrong IUCN status we could:

  1. Not match the IUCN synonym to any backbone record. This is a considerable code change in the checklist matching, which is not the same as the occurrence matching
  2. prefer the accepted IUCN species over the synonym in case there are multiple records matched to the same nubKey

Not sure if the later has other side effects, but that seems like a less invasive and doable change.

mdoering commented 2 years ago

Implementation of the 2nd option is simple and can be done immediately.

Looking at the IUCN redlist there are 2988 names which exist at least twice with the same backbone match. These are genera, species, subspecies, varieties and forms

ahahn-gbif commented 2 years ago

option 2. sounds very plausible as it relates to species and subspecific ranks.

For genera and above, I am less sure - should we expect IUCN categories linked to those at all, and what could the consequences of conflicts around accepted/synonymized higher ranks be?