metadata identifier property

idevisser commented 4 years ago

I have the impression it is still not possible with this version of DCAT-AP to avoid the current problems with uniqueness of metadata records. When spatial catalogs (OGC:CSW) harvest each other, the uniqueness of a metadata record is checked by the metadata identifier (fileidentifier). Duplicate metadata records of the same dataset are with this method, prevented. The European open data portal based on DCAT, harvest national open data portals based on DCAT and also the European INSPIRE (based on ISO) portal and national spatial portals (based on ISO). There the problem becomes clear, in DCAT 1 there is not a mandatory property, which contains this unique, unmodified fileidentifier. By harvesting from different resources, partly containing the same metadata, duplicates are created. They describe the same dataset but are not recognized a the same metadata. Please add in the dcat:CatalogueRecord class a mandatory identifier property

makxdekkers commented 4 years ago

@idevisser I think the idea was that dct:source would be used to refer "to the original metadata that was used in creating metadata for the Dataset" as stated in the Usage note for that property. In general, the model of DCAT does not have a notion of an immutable metadata record; the metadata for a Dataset may well be derived from some original metadata, but any aggregator can add metadata statements or enrich existing ones.

bertvannuffelen commented 4 years ago

In addition to @makxdekkers comments, the property dct:source is maintained in DCAT-AP 2.0, although in DCAT 2.0 an example is mentioned in which this case could be addressed using Prov-O. See https://w3c.github.io/dxwg/dcat/#examples-dataset-provenance .

@idevisser Note that if you would follow the Linked Data principles the URI of the dataset would resolve to the original dataset description. And a similar choice could be made for other entities such as the catalog record.

With the risk to open a long debate I would like to come back on your motivation for the request. Personally, the 'fear for duplicates' is for me a false problem. Better is to acknowledge that for the end-user (the one who is reusing the data), there are many similar datasets (and hence dataset descriptions). The challenge for data portals is then to offer to the end-users smart search services that would take into account these similarities and aid the end-users in selecting the right dataset.

E.g. the European Union Open data portal publishes all datasets from Eurostat. Which in turn are (or should be) the simple concatenation of the data from the member states; which in turn are the concatenation of the data of regional > local level. For the end-user the EDP, which harvests the EU ODP and the member states data portals, might feel as a redundant republishing of statistical data if the relationships are not clear.

The question becomes then, does DCAT-AP offers sufficient meta-data descriptions that would support the building of such a smart service? For me there is at least sufficient metadata field available to make it happen (in DCAT-AP 2.0 even more), but maybe we have to agree on how to use them to make the realisation of smart search services easier. But that would be a topic for a next release I propose.

bertvannuffelen commented 4 years ago

resolution: feedback provided

SEMICeu / DCAT-AP

metadata identifier property #106