gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

GBIF occurrences duplicates #3459

Closed Dina-Sharafeldeen closed 3 years ago

Dina-Sharafeldeen commented 3 years ago

Hi, We are working on one occurrences dataset downloaded from GBIF. We noticed that some species with the same attributes' values have different gbifID.
kindly find a sample of this case under the following link:
https://drive.google.com/file/d/1D7WfjvKbBCzRj2uBz_f0_qkDnHinhRUs/view?usp=sharing Should we consider this as duplicates? if we consider that as duplicates, which ones we should keep and which ones we should delete? Thanks Dina Sharafeldeen

MortenHofft commented 3 years ago

That is how the data is published, the catalog number and the organismId is different (for the example i looked at). So according to the dataset publisher this are different organisms that happen to be collected at the same time same day same species. At least that is how it appears to me.

@rukayaj you can perhaps answer this? Better than I can at least.

rukayaj commented 3 years ago

@Dina-Sharafeldeen I think the records you refer to are from this dataset, right? https://www.gbif.org/dataset/264e6a66-9c9e-4115-9aec-29d694c68097#description The data curator added some explanatory text in the dataset description, here: https://www.gbif.org/dataset/264e6a66-9c9e-4115-9aec-29d694c68097#description

An accession generally refers to a given individual at a specific place and time; for each accession, one or more items may exist. Examples of item types include preserved skins, feathers or eggshells, blood, tissue or sperm samples, footage of live sperm motility, microscope slides prepared for sperm morphometric analyses, testes and extracted DNA.

I think it would be clearer if the text included some of the information we have in our internal wiki - I will talk to the data curators and reword/add things, but here I have pasted verbatim from the wiki for the meantime:

Also worth noting is this massive thread https://github.com/tdwg/dwc/issues/314 on the nature of MaterialSamples vs Occurrences - this is something GBIF Norway is following closely and we will of course adapt to what the community decides are the best practices.

So anyway, the records you are seeing may possibly from the same individual, can you look at the organismID and see what that says? I will have time to look into this more deeply later next week.

ManonGros commented 3 years ago

The question seem to have been answer. Let us know if the issue needs to be reopened.