SAEONData / Open-Data-Platform

SAEON Open Data Platform core services, APIs and UIs.
MIT License
2 stars 0 forks source link

Representation of the digital object identified by a DOI in the metadata #27

Open marksparkza opened 2 years ago

marksparkza commented 2 years ago

Issue moved here from the private archived ODP-Data repository. ODP-Data contains schemas and initialization data and scripts for the ODP v1 platform.

Currently, the SAEON DataCite 4.3 schema has a dependency rule which says that if "doi" is present in the metadata, then "immutableResource" must also be present.

This dependency rule was created based on the idea that if there is a DOI, there must be a corresponding digital object. In practice, however, DOIs may be issued for datasets that do not yet exist, datasets might only be available offline or subject to access control, and datasets may be retracted due to obsolescence.

The short-term solution to these pitfalls is simply to remove this dependency rule. [This was done on 19 Nov 2020]

In the long term, however, we need to rethink the structure, content and purpose of the immutableResource and linkedResources properties. Some of the above conditions are currently catered for by linkedResourceType (OfflineAccess, ConditionalAccess), while immutableResource really only caters for the case of a dataset being online and available.

I would propose a new property "digitalObject", which explicitly represents the resource identified by a DOI, regardless of the status of that resource. This would supersede immutableResource and the two linkedResource cases mentioned above. We could then re-introduce a dependency rule which says "if I have a DOI, I must have a digital object". The rationale is that if there is a DOI in the metadata, the metadata should say something about the digital object identified by that DOI. Such a property would at a minimum indicate the status of the resource which could, for example, be one of "pending", "online", "offline", "restricted", or "retracted". The status would then determine which sub-properties can or should be present under "digitalObject".

marksparkza commented 2 years ago

Currently there's a provisional 'dataset' entity on the ERD. There's potentially a case here for using this to represent any digital object associated with a DOI, and autogenerating the relevant metadata properties at publishing time.

If we implement 'dataset' as a first class entity in the ODP - linked directly to an ODP metadata record - we open up a whole lot of possibilities for more systematic data(set) management.