NHMDenmark / DanSpecify

Important files regarding the Danish instance of the Specify database system for collections digitisation and management, plus placeholder for issue tracking. Guidelines, manuals and other kinds of documentations will be gathered on the wiki.
3 stars 2 forks source link

Retirement of Fish collection GBIF dataset #261

Closed FedorSteeman closed 7 months ago

FedorSteeman commented 8 months ago

The old static dataset uploaded to DanBIF back in the day needs to be swapped out with the new dynamic dataset by swapping data endpoints. However, before we can do that, we need to ensure record level linkage is retained.

The URL of the dataset: https://www.gbif.org/dataset/84dbaec2-f762-11e1-a439-00145eb45e9a

The challenge here is that the original catalogue numbers were not associated with the records. The catalogue numbers used in the dataset are not real catalogue numbers, but some sort of auto-incremented serial number. We need to somehow associate the original catalogue numbers as well as the occurrenceIDs with these records, before we swap out the endpoints. GBIF will help us if we can provide a list pairing the new occurrenceIDs (UUIDs) with the old ones (URNs).

FedorSteeman commented 7 months ago

It took more than a month, but month, but I strained myself to finally produced some level of correlation between the new and the old occurrence IDs of our fish dataset. We only had species name, locality and event date to go after and, unfortunately, we could only cover a bit under a half (47%) of the old static dataset. The rest must be considered lost cause, as it’s been too hard to link the old records up with the new ones.

The attached file that has columns for the gbifID and the current occurrence ID as triplets that should be replaced with the new GUID ones. The real catalog numbers (and alt/old catalog numbers) are also provided for replacement, if this makes any differences for the imminent dataset swap.

When this is done, we can swap the endpoint with that of the dynamic dataset and expectedly a significant level of record level linkage will be retained.

GBIF-FISK-new-occurrenceIDs.xls

FedorSteeman commented 7 months ago

Done