SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

Add guidance on usage of additional themes for datasets #316

Open jakubklimek opened 8 months ago

jakubklimek commented 8 months ago

Based on https://github.com/SEMICeu/DCAT-AP/issues/314, https://github.com/SEMICeu/DCAT-AP/issues/207, and the description of dcat:theme, the chapter on Other controlled vocabularies, I think there is a need for clarification of usage of additional dataset themes, including examples. It is clear that Data theme vocabulary needs to be used for dcat:theme. What is unclear from the current state of DCAT-AP 3.0.0 and where the discussions are not yet concluded is how additional themes should be used. Let's say I want to use Eurovoc in addition to Dataset theme. What do I do?

The usage note for dcat:theme says: "The values to be used for this property are the URIs of the concepts in the vocabulary." It is unclear whether it is ONLY values from the vocabulary, or AT LEAST ONE value from this vocabulary.

Option 1 (implemented in the Czech National Open Data Catalog): Use dcat:theme also for Eurovoc, e.g.:

<dataset1> dcat:theme <http://publications.europa.eu/resource/authority/data-theme/TRAN> .
<dataset1> dcat:theme <http://eurovoc.europa.eu/1001> .

This seems to be discouraged by @bertvannuffelen in https://github.com/SEMICeu/DCAT-AP/issues/207#issuecomment-1700613026 and creation of subproperties of dcat:theme and enforcing the ONLY values from the dataset theme vocabulary policy is suggested. However, as I mentioned in https://github.com/SEMICeu/DCAT-AP/issues/314#issuecomment-1765711454, I do not think that these two approaches go together, as from the RDF point of view, the values of a subproperty can be interpreted also as values of the superproperty, i.e. dcat:theme, violating the constraint.

Option 2 Other vocabularies use dct:subject:

<dataset1> dcat:theme <http://publications.europa.eu/resource/authority/data-theme/TRAN> .
<dataset1> dct:subject <http://eurovoc.europa.eu/1001> .

This is another approach suggested by @bertvannuffelen in https://github.com/SEMICeu/DCAT-AP/issues/314#issuecomment-1764792636, which does not create any problems. However, it is not mentioned anywhere in DCAT-AP.

I think this shows the need for a decision and a clearer guidance on how additional dataset themes should be used in DCAT-AP.

matthiaspalmer commented 8 months ago

Perhaps some inspiration can be taken from GeoDCAT-AP where both dcat:theme and dcterms:subject are used. See the controlled vocabulary section for an overview.

There is a bit of discussing in B.6.8 regarding why different expressions have been mapped to either dcat:theme or dcterms:subject. However, I fail to see a clear argument why which property should be used beyond the need to separate use of each vocabulary (which will fail if we also have a dcat:theme on Data Services, which I think we should have).

jakubklimek commented 8 months ago

Also in BRegDCAT-AP, both controlled vocabularies (Data theme and Eurovoc) are available for Dataset: image

bertvannuffelen commented 5 months ago

I propose to take this out of the release of DCAT-AP 3.

This discussion is also about the interpretation of expressing additional constraints on properties.

E.g. Is there a difference in the expectation of the MUST in the expression that the property bankAccountNR must be an IBAN structure (which could be expressed as a literal with a specific regex expression) and the expression that a property theme must adhere to a the NAL:datatheme. ?

sirex commented 4 months ago

SKOS have skos:ConceptScheme, to define if a concept is part of a controlled vocabulary or not.

For example in EU PO data-theme controlled vocabulary each concept specifies scheme of which a concept is part of:

<rdf:Description rdf:about="http://publications.europa.eu/resource/authority/data-theme/AGRI">
  <skos:inScheme rdf:resource="http://publications.europa.eu/resource/authority/data-theme"/>
  <skos:topConceptOf rdf:resource="http://publications.europa.eu/resource/authority/data-theme"/>
</rdf:Description>

I don't understand, why vocabulary restriction is put on dcat:theme, as I understand, vocabulary should be set using skos:inScheme.

Validators then could only consider those concepts, that are in specific scheme (controlled vocabulary), specified with skos:inScheme.

I also agree with @jakubklimek https://github.com/SEMICeu/DCAT-AP/issues/314#issuecomment-1765711454, if subproperties are required for dcat:theme, then dcat:theme itself should not have vocabulary restriction and a :semicTheme subproperty should also be used to enforce a vocabulary.