Closed Dina-Sharafeldeen closed 3 years ago
That is how the data is published, the catalog number and the organismId is different (for the example i looked at). So according to the dataset publisher this are different organisms that happen to be collected at the same time same day same species. At least that is how it appears to me.
@rukayaj you can perhaps answer this? Better than I can at least.
@Dina-Sharafeldeen I think the records you refer to are from this dataset, right? https://www.gbif.org/dataset/264e6a66-9c9e-4115-9aec-29d694c68097#description The data curator added some explanatory text in the dataset description, here: https://www.gbif.org/dataset/264e6a66-9c9e-4115-9aec-29d694c68097#description
An accession generally refers to a given individual at a specific place and time; for each accession, one or more items may exist. Examples of item types include preserved skins, feathers or eggshells, blood, tissue or sperm samples, footage of live sperm motility, microscope slides prepared for sperm morphometric analyses, testes and extracted DNA.
I think it would be clearer if the text included some of the information we have in our internal wiki - I will talk to the data curators and reword/add things, but here I have pasted verbatim from the wiki for the meantime:
The records in the DNA datasets (* apart from the Mammal and Bird datasets) may possibly have some duplicate records in the non-DNA UiO NHM datasets. This is because the DNA collections are partially made up of DNA samples from the specimens in the main collections, which may sometimes be published separately on GBIF. Additionally, some DNA samples are taken from living organisms in the wild, which are then released and do not become museum specimens.
Multiple tissue samples are often taken from a specimen:
A great tit captured in a mist net at Tøyen on 19.06.2020 will be registered as one accession (=occurrence record) in our collection database; this record will hold info about species, locality, date, ring number, etc.
Let’s say I took a blood sample, a sperm sample and a feather sample from the bird before I released it; each of these samples will then be registered as “sub-records” on the main record created above (what we call “items”)
This one bird will then appear as three points on the map in Artskart/GBIF
Tissue samples are curated as separate collection objects from the specimen they are taken from (kept in different locations etc), and so are assigned individual occurrenceIDs - i.e. identifiers for each individual item (tissue sample, DNA extract, etc) belonging to a record/accession.
Occurrence records from the "same individual" (i.e. the same occurrence of an organism in space and time) have the same organismID in the simpledwc file and are also linked together through the resource relationship file.
The resource relationship file may seem to sometimes contain discrepancies. However, these can usually be explained:
Also worth noting is this massive thread https://github.com/tdwg/dwc/issues/314 on the nature of MaterialSamples vs Occurrences - this is something GBIF Norway is following closely and we will of course adapt to what the community decides are the best practices.
So anyway, the records you are seeing may possibly from the same individual, can you look at the organismID and see what that says? I will have time to look into this more deeply later next week.
The question seem to have been answer. Let us know if the issue needs to be reopened.
Hi, We are working on one occurrences dataset downloaded from GBIF. We noticed that some species with the same attributes' values have different gbifID.
kindly find a sample of this case under the following link:
https://drive.google.com/file/d/1D7WfjvKbBCzRj2uBz_f0_qkDnHinhRUs/view?usp=sharing Should we consider this as duplicates? if we consider that as duplicates, which ones we should keep and which ones we should delete? Thanks Dina Sharafeldeen