Open JacquemotMC opened 3 years ago
The Host
section of the standard reuses subset of fields from the re3data. Hence, you were able to map them 1:1. There are properties such as backup_frequency
, storage_type
and backup_type
which we added based on the user stories from the open stakeholder consultation. As your mapping indicates, they cannot be populated with information from re3data. This is not an issue though, because they are optional.
I can see in your mapping that there is an identifier
property for the repository available at re3data, and there is no countepart in the recommendation. I think that at the time when we were drafting the recommendation, the identifiers for repositories were simply not available yet.
Do you think that we should extend the recommendation with the identifier
for Host
?
👍🏻 Yes, I think it makes sense to add 'identifier' to 'Host'.
I would also be in favor of an identifier. Thus we would apply some of the maDMP principles.
I'm not sure I understand the need for a separate identifier. The URI for the host is already a globally unique ID - why is this not sufficient?
I think the URI for the host is fine if that is all you have to work with. The definition in the standard for the host URL describes it as the repository's homepage and while unique doesn't provide any further machine readable context.
Like @JacquemotMC, we're using re3data's repository registry in our tool which can provide much of the host information (e.g. backup_frequency, pid_system, etc.). I think, as an aside, that sites like re3data would ideally maintain and manage host metadata rather than it being a part of this standard unless there's a very specific reason to have a historical snapshot of the host's metadata within this standard.
I was considering opening a PR to add host_id
for this issue but noticed that one of the examples contains:
"host": {
"title": "GitHub",
"url": "https://www.re3data.org/repository/r3d100010375",
"host_id_type": "HTTP-RE3DATA"
}
I don't see this documented anywhere else however. Should we not follow the same pattern that we use elsewhere for identifiers?
"host": {
"title": "GitHub",
"host_id": {
"type": "url",
"identifier": "https://www.re3data.org/repository/r3d100010375"
}
}
In order to enable an "automatic" pre-population of the host description, I have made this comparison with re3Data. Here are my observations:
HostComparison.xlsx alignment of properties : eg. repositoryUrl some missing properties regarding backup The ISO standard for geolocation is not the same either.