RDA-DMP-Common / RDA-DMP-Common-Standard

Official outputs from the RDA DMP Common Standards WG
The Unlicense
65 stars 34 forks source link

Host description #55

Open JacquemotMC opened 3 years ago

JacquemotMC commented 3 years ago

In order to enable an "automatic" pre-population of the host description, I have made this comparison with re3Data. Here are my observations:

HostComparison.xlsx alignment of properties : eg. repositoryUrl some missing properties regarding backup The ISO standard for geolocation is not the same either.

TomMiksa commented 3 years ago

The Host section of the standard reuses subset of fields from the re3data. Hence, you were able to map them 1:1. There are properties such as backup_frequency, storage_type and backup_type which we added based on the user stories from the open stakeholder consultation. As your mapping indicates, they cannot be populated with information from re3data. This is not an issue though, because they are optional.

I can see in your mapping that there is an identifier property for the repository available at re3data, and there is no countepart in the recommendation. I think that at the time when we were drafting the recommendation, the identifiers for repositories were simply not available yet.

Do you think that we should extend the recommendation with the identifier for Host?

briri commented 3 years ago

👍🏻 Yes, I think it makes sense to add 'identifier' to 'Host'.

JacquemotMC commented 3 years ago

I would also be in favor of an identifier. Thus we would apply some of the maDMP principles.

paulwalk commented 3 years ago

I'm not sure I understand the need for a separate identifier. The URI for the host is already a globally unique ID - why is this not sufficient?

briri commented 3 years ago

I think the URI for the host is fine if that is all you have to work with. The definition in the standard for the host URL describes it as the repository's homepage and while unique doesn't provide any further machine readable context.

Like @JacquemotMC, we're using re3data's repository registry in our tool which can provide much of the host information (e.g. backup_frequency, pid_system, etc.). I think, as an aside, that sites like re3data would ideally maintain and manage host metadata rather than it being a part of this standard unless there's a very specific reason to have a historical snapshot of the host's metadata within this standard.

briri commented 1 year ago

I was considering opening a PR to add host_id for this issue but noticed that one of the examples contains:

"host": {
  "title": "GitHub",
  "url": "https://www.re3data.org/repository/r3d100010375",
  "host_id_type": "HTTP-RE3DATA"
}

I don't see this documented anywhere else however. Should we not follow the same pattern that we use elsewhere for identifiers?

"host": {
  "title": "GitHub",
  "host_id": {
    "type": "url",
    "identifier": "https://www.re3data.org/repository/r3d100010375"
  }
}