cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

FORS change of identifiers #640

Open john-shepherdson opened 5 months ago

john-shepherdson commented 5 months ago

From: Guillaume Lefebvre Date: Fri, 8 Mar 2024 at 10:59 Subject: Question about OAI-PMH harvesting and identifier change To: John Shepherdson

As you know, our application SWISSUbase is currently harvested by the CESSDA catalogue.

In a near future, we plan to change some bits about how our internal dataset versioning works within SWISSUBase. This would imply that we would change some metadata identifiers currently provided through our OAI-PMH endpoint

First, we will need to change the OAI-PMH record header "identifier" field from something like "oai:swissubase.ch:1000-1-0", to "oai:swissubase.ch:9517409b-6854-48cf-913a-2e84ea4782bd" Second, we currently provide 2 IDNo within DDI profile, one being a DOI identifier and the other being an internal SWISSUbase identifier. We will drop the latter, only keeping the DOI identifier as unique IDNo. Basically we we would like to know how to proceed with this ? I expect that records cannot be really updated, since the record "identifier" will change. So:

Shall we ust release our new SWISSUbase version with new versioning scheme, then it will be harvested by CESSDA, duplicates will be created, and CESSDA can then drop the old records ? Shall we sync our release with CESSDA, so that old records are dropped first, and then new ones are harvested ? Do you have another solution to propose ?

john-shepherdson commented 5 months ago

I think we should drop their current records then harvest the new ones. Any reason why not?

matthew-morris-cessda commented 5 months ago

I don't think we need to do anything. The old records will be automatically deleted and having 2 IDNo elements isn't an issue, as we deal with this in other repositories.

john-shepherdson commented 5 months ago

Why will the old records be deleted automatically? The new records will have a different value in the OAI-PMH record header "identifier" field, won't they be considered to be different records from the existing ones?

matthew-morris-cessda commented 5 months ago

Yes, that's why the old records will be automatically deleted. The pipeline will see the old identifiers as orphaned and delete them, and write out the records with the new identifiers.