Attribute for connection between distribution and dataset. BelongToDataset

albertoabellagarcia commented 1 year ago

Although it is pretty easy to know what distributions are linked to a dataset, the opposite is not possible. In Smart Data Models we have extended DCAT 2.1.1. with such attribute by request of users. Users have created the attribute belongToDataset to cover it. More info in about it in this link and the specification generated.

bertvannuffelen commented 1 year ago

@albertoabellagarcia, I very reluctant to introduce inverse properties just because of technical limitations in implementations.

Observe that inverse properties come with an implementation cost as well.

Consider the harvesting of 2 datasets

ex1:dataset1 dcat:distribution ex1:dist1.

and

ex2:dist2 new:belongsToDataset ex2:dataset2.

Then in order for a portal to show all distributions of a dataset, the algorithm to be applied is:

find all dcat:distribution for a dataset
find all distributions for which the property new:belongsToDataset hold

If portals do not want to change their behavior then the harvester has to apply the above algorithm and insert for all cases 1 a triple for new:belongsToDataset and for all cases 2 a triple for dcat:distribution.

Inverse properties will impact current implementations.

Note that in json(-LD) this can be solved without the introduction of a new property

{
  "@context" : {
    "belongsToDataset" : {
      "@reverse" : "http://www.w3.org/ns/dcat#distribution",
      "@type": "@id"    },
    "Distribution": "http://www.w3.org/ns/dcat#Distribution"
  },
  "@id" : "https://example.com/dist2",
  "@type": "Distribution",
  "belongsToDataset": "https://example.com/dataset2"
}

Thus in the above Json(-LD) there is a property with an attributename that does not occur in DCAT, but because of the JSON-LD interpretation the semantics are fixed within the DCAT space. And thus the need for introducing an inverse property in the data specification disappears, as the technical format addresses this.

I rather would keep the specification as simple as possible, without the need adding an inverse reasoning interpretation.

bertvannuffelen commented 1 year ago

Observe: this inverse property already happens for DatasetSeries (dcat:next, dcat:prev) where implementers will have to build first the closure graph before putting a navigation links between Datasets in a DatasetSeries. Unless explicit directions are chosen and it is clear that the other can be trivially inferred from it, closures appear.

matthiaspalmer commented 1 year ago

I agree with @bertvannuffelen, introducing mandatory inverse properties comes with a cost and should be avoided. That being said nothing hinders implementors / regional actors from introducing additional "efficiency" triples that are useful for them, it's just that you cannot expect everyone else to understand them as they are not part of the DCAT or DCAT-AP specifications. E.g., such triples will be ignored by upstream portals. In fact, the Swedish portal relies on extra "efficiency" triples to work that are added during the harvesting process, clearly we do not expect anyone else to understand these triples. The "efficiency" triples uses a separate dedicated namespace for properties and a few classes to make sure that they are not clashing with anything.

Would a similar approach with extra "efficiency" triples work for you @albertoabellagarcia?

bertvannuffelen commented 8 months ago

Unless there comes more suggestions and feedback that this issue could progress. The proposal is to close the issue with the release 3.0.0

SEMICeu / DCAT-AP

Attribute for connection between distribution and dataset. BelongToDataset #270