Datasets with multiple and no (yet) IDs

RDA-DMP-Common / RDA-DMP-Common-Standard

Official outputs from the RDA DMP Common Standards WG

The Unlicense

62 stars 34 forks source link

Datasets with multiple and no (yet) IDs #34

Open MarekSuchanek opened 4 years ago

MarekSuchanek commented 4 years ago

During adjusting our model with @rwwh, we found out that for dataset having exactly one "dataset_id" is too limiting.

A dataset can have multiple identifiers, for example DOI + ARK + URL
When doing data management planning - we are planning what dataset are we going to have and those may not yet be published

TomMiksa commented 4 years ago

If we changed the cardinality of dataset_id, would it also solve the issue #33 ? That is, a list of identifiers would include "historical" identifiers and current.

MarekSuchanek commented 4 years ago

I think it would - if to 0..n. Of course, I am not then sure if there would be also need to distinguish also current, historical, or even reserved identifiers somehow.

briri commented 4 years ago

I can see the usefulness of allowing for alternate/additional identifiers. I think we need to understand the use cases.

If the primary use case is to allow for historical identifiers (versions) of an object we could perhaps introduce something like a related_identifiers array. This is a common pattern and I think could solve most cases. For example:

{
  "related_identifiers": [
    { "type": "doi", "identifier": "10.123/1234abc", "relation_type": "is_version_of" }
  ]
}

I'm not sure what ontology would be most appropriate for the relation_type.

JacquemotMC commented 3 years ago

I can see the usefulness of allowing for alternate/additional identifiers. I think we need to understand the use cases.

If the primary use case is to allow for historical identifiers (versions) of an object we could perhaps introduce something like a related_identifiers array. This is a common pattern and I think could solve most cases. For example:
{
  "related_identifiers": [
    { "type": "doi", "identifier": "10.123/1234abc", "relation_type": "is_version_of" }
  ]
}
I'm not sure what ontology would be most appropriate for the relation_type.

dct: isVersionOf

paulwalk commented 3 years ago

While I don't disagree with the reasoning here, I would like to push back a little against the idea of handling multiple IDs and especially of modelling their sematics in the DMP Common Standard. While I agree that all of these things exist, the question for us to consider is:

"Do we need to model these things to enable the exchange of semantically useful DMPs?"

I would like to argue that using a single ID (consistently) is enough to achieve this. If some one wants to relate multiple IDs together, that can be done outside of the DMP standard.