SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

The domain and range of adms:sample #156

Closed aidig closed 6 months ago

aidig commented 4 years ago

The domain and range of adms:sample is restricted to adms:Asset (semantic asset). (adms:Asset is a subClassOf dcat:Dataset). See http://www.w3.org/ns/adms.ttl

<http://www.w3.org/ns/adms#sample> dcterms:identifier "adms:sample";
    a rdf:Property, owl:ObjectProperty;
    rdfs:comment "Links to a sample of an Asset (which is itself an Asset)."@en;
    rdfs:domain <http://www.w3.org/ns/adms#Asset>;
    rdfs:isDefinedBy <http://www.w3.org/TR/vocab-adms>;
    rdfs:label "sample"@en;
    rdfs:range <http://www.w3.org/ns/adms#Asset>.

Not all datasets are (semantic) assets, seeing that semantic assets in ADMS are considered to be highly reusable metadata (e.g. xml schemata, generic data models) and reference data (e.g. code lists, taxonomies, dictionaries, vocabularies). (ADMS 2013)

If this property indeed is needed in DCAT-AP (See the discussion in this issue "dcat:Dataset - remove adms:sample? #10)", it might be beneficial to explore other generic solutions for referring to dataset samples/excerpts.

aidig commented 4 years ago

Furthermore, in DCAT-AP the range of adms:sample is dcat:Distribution, whereas the adms:sample seems to declare the sample to be a semantic asset (a dataset) as well.

bertvannuffelen commented 3 years ago

This is a case where we hit the bounderies of reuse.

From: https://www.w3.org/TR/vocab-adms/ The definition for adms:sample is "Links to a sample of an Asset (which is itself an Asset)."
And adms:Asset is a subclass of dcat:Dataset (see 5.1.2 Asset )

Consequently the range is thus a dcat:Dataset.

The specification DCAT-AP states that the range is dcat:Distribution. So from this knowledge, one can infer that an entity referred by adms:sample in the usage context of DCAT-AP is both a dcat:Dataset and dcat:Distribution.

According to the RDF information of DCAT, this is not a contradiction, however "expectation wise" (See section B Alignment with Schema.org) the distinction between (abstract) Dataset and (concrete) DataDownload matches dcat:Dataset / dcat:Distribution and (section 2 Motivation for change) It made an important distinction between a dataset as an abstract idea and a distribution as a manifestation of the dataset. there is.

An additional argumentation, the technical RDF specification of adms also incorporates the rdfs:domain statement (not present in the html version) for adms:sample making every DCAT-AP dataset also an ADMS asset, when providing a sample. This maybe also creates an expectation mismatch.

Observe that from the logical perspective there is no problem. However from the intentional perspective there is. So in the building of the specification it is unclear which reuse of adms:sample was intended:

Within the SEMIC team these challenges around reuse have been raised and we are working towards an approach that would make it at least clearer the notion of reuse.

bertvannuffelen commented 3 years ago

As a general resolution: DCAT itself proposes to use for these relations qualitified relations see https://w3c.github.io/dxwg/dcat/#qualified-forms

jakubklimek commented 3 years ago

@bertvannuffelen this is somehow related to a bit wider discussion held in the Dataset Exchange Working Group during DCAT2 development.

The core of the issue was whether or not it was OK to add statements to Classes and Properties of external vocabularies and who would be affected by them. E.g. whether saying dcterms:title rdfs:label "Dataset title"@en would affect only those working with DCAT, or, in fact, everyone using dcterms:title even without DCAT, using dcterms:title e.g. for book titles, which is undesirable. There were opinions such that it is up to the users of DCAT to be aware of the the context, i.e. that this statement would be valid "only in the context of DCAT". However, there is no machine readable way of saying this and in my opinion, having those statements in a separate file does not solve this.

This is similar to the statement above:

one can infer that an entity referred by adms:sample in the usage context of DCAT-AP

What is the usage context? If I had a triplestore with all relevant vocabularies loaded, and I would use adms:sample for data unrelated to DCAT, I would still see the objects of this relation as dcat:Distributions, which is both unexpected, and undesirable.

My opinion is that to avoid such problems, existing Classes and Properties should be reused only if their semantics match exactly, i.e. no changes/additional definitions of domains and ranges. In all other cases, a subclass or subproperty needs to be created to carry the additional semantics.

In the DCAT instance, all annotations of external vocabularies were separated into an additional file, which was the best compromise we could reach. But the example here with adms:sample is another instance of the same problem. I think it should have been a subproperty of adms:sample, rather than adms:sample itself.

bertvannuffelen commented 3 years ago

@jakubklimek thanks for the pointer to the discussion in the DCAT WG.

Personally I am in favor of having a strict interpretation for reuse, so that the logical inference and the intuition both match. However, how that to achieve (e.g. as shown by the discussion) I am open. The DCAT discussion adds an additional idea for that.

But one point I like to make is that we should try to be as transparent as possible in the choices here. Reducing so the interpretation discussions.

bertvannuffelen commented 1 year ago

In the candidate release for ADMS, adms:sample domain and range has been broadened. See https://semiceu.github.io/ADMS/releases/2.00/

init-dcat-ap-de commented 1 year ago

Does this mean a) we broaden the range of adms:sample in DCAT-AP 3.0 as well? Or b) is it now only in line with DCAT-AP's range of dcat:Distribution, so we change nothing?

As far as I unterstand: b).

bertvannuffelen commented 1 year ago

Indeed, this is a case we have aligned ADMS vocabulary with the usage in DCAT-AP so that there is no semantical conflict. So the impact on DCAT-AP is nothing.

bertvannuffelen commented 6 months ago

With the publication of the release https://semiceu.github.io/ADMS/releases/2.00/ we close this issue.