Health-RI / health-ri-metadata

health ri metadata schemas
7 stars 2 forks source link

Distribution identifier is missing. #142

Open hcvdwerf opened 2 months ago

hcvdwerf commented 2 months ago

As researcher I want to request a distribution . I expect an expect an identifier within distribution to make linking possible. I think it should be mandatory

brunasv commented 2 months ago

Distribution in dcat doesn't have an identifier; the Dataset to which such Distribution is connected has an identifier. Distribution has an access URL (or download URL). I think the FAIR data point creates a sort of identifier to link all layers (catalog, dataset, distribution) this would be nice to check w @kburger.

JackBroeren commented 1 month ago

I would like to have a distribution identifier as well. If there are multiple distributions in a dataset with the same data but different formats i would like to select the distribution in the format that is most convenient for me. So i would like to be able to specify in a request tool that i want a specific distribition: an identifier would be helpfull, preferably explicitly provided by a data-holder and not generated internally by a tool like FDP. Suppose i don't use an FDP ?

hcvdwerf commented 1 month ago

@JackBroeren but when the format is different the acces url is also different. data.csv vs data.xml. Is that what you mean ? I can imagine that you don’t want to rely on that only

JackBroeren commented 1 month ago

@hcvdwerf @brunasv The accessURL is foreseen as a URL that contains a reference to a html page that describes the process how to get access to the distribution. I would not like to have the distibution identifier hidden in this url but explicitly defined. I saw a EHDS demo last friday of the catalog and the request process. In that demo a requester does not request access to a dataset but to a specific distribution, like i said before in my example. So a distribution ID is necessary (imho) and it should be specified bij de dataholder, and not by some tool in the middle (different tools can create the same identifier: bytheway: i would prefer a global unique (persistent) identifier). A distribution ID should be part of the metadata (like a keyword or a theme).

kburger commented 1 month ago

Both dcat-ap (3) and healthdcat do not specify a dct:identifier on distribution (or catalog for that matter), only on a dataset. If it is decided to add it, we'd have to be careful with the cardinality.

jundahuang9123 commented 1 month ago

@JackBroeren accessURL is more than enough to refer to the distribution. The URL can be used to describe how to get access but it is intended to point toward the distribution. EHDS focuses the accessRight entirely on the Dataset level, thus request can only be made at the datasets. I didnt see the demo, so not sure if there is misleading there. As you mentioned before that you dont use a FDP, what do you use then to expose your metadata? In FDP the distribution level metadata are not exposed per catalog/dataset. If there is no expose of the metadata/ not exposed at distribution level, the identifier, however persistent and global, would not be of use to data users/requesters.

jambelien commented 1 month ago

Based on this very good discussion and various viewpoints: @HNeikes should we discuss this in the Geonovum DCAT AP expert group as well? So import/copy this Geonovum repo?

HNeikes commented 1 month ago

https://github.com/Geonovum/DCAT-AP-NL30/issues/193