gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Clustering: typification relationship too eager #984

Open timrobertson100 opened 7 months ago

timrobertson100 commented 7 months ago

We're adding relationships between holotype records that we've snapped to higher taxa. We shouldn't of course.

Example https://www.gbif.org/occurrence/1086482312/cluster

abubelinha commented 2 months ago

Another example: 2821272478 https://www.gbif.org/occurrence/2821272478/cluster (wrongly includes 2805553356 in cluster)

Moreover, for some reason 1935919831 is not in that cluster (despite being a duplicate of 2821272478): https://www.gbif.org/occurrence/1935919831/cluster

abubelinha commented 2 months ago

BTW: something I miss in both normal and cluster api views is the ability to see the provider's verbatim scientificName.

I am trying my own script to supervise some gbif-detected clusters (as above) and also try to catch some possible clusters which gbif cannot detect:

If I am right, I need two separate api calls for viewing things like stateProvince and verbatim scientificName of gbif-clustered occurrences:

  1. occurrence/{gbifId}/experimental/related (to check for gbif-detected clusters and verify whether they are correct or not)
  2. occurrence/{gbifId}/verbatim (a call for each related occurrence if I want to check provider's original scientificName besides stateProvince or locality ... and do my own verifications)