SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

Controlled vocabulary to be used with `dcat:themeTaxonomy`? #207

Closed init-dcat-ap-de closed 4 weeks ago

init-dcat-ap-de commented 2 years ago

In addition to #196, on its own:

Doesn't the usage note "The value to be used for this property is the URI of the vocabulary itself, i.e. the concept scheme, not the URIs of the concepts in the vocabulary." mean, that

_:myDS dcat:themeTaxonomy <http://publications.europa.eu/resource/dataset/data-theme> .

is the only allowed way to use dcat:themeTaxonomy? Because we MUST use <http://publications.europa.eu/resource/dataset/data-theme>?

So what's the value added?

Also: http://publications.europa.eu/resource/dataset/data-theme is probably wrong. Shouldn't it be http://publications.europa.eu/resource/authority/data-theme?

bertvannuffelen commented 2 years ago

The issue is situated in section 5.2.

Indeed your interpretation is correct:

As there is no codelist of acceptable theme-codelists, I propose to drop the row in the table in section 5.2

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

dcat:themeTaxonomy Catalogue Dataset Theme Vocabulary http://publications.europa.eu/resource/authority/data-theme The value to be used for this property is the URI of the vocabulary itself, i.e. the concept scheme, not the URIs of the concepts in the vocabulary.

with the addition to the usagenote in the table in section 4.1.2 for the property dcat:themeTaxonomy

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

themes | dcat:themeTaxonomy | skos:ConceptScheme | This property refers to a knowledge organization system used to classify the Catalogue's Datasets. | 0..n -- | -- | -- | -- | --

The new usage note would be:

This property refers to a knowledge organization system used to classify the Catalogue's Datasets. It must have at least the value NAL:data-theme as this is the manatory controlled vocabulary for dcat:theme .

An alternative formulation could be to express this requirement below the table in 5.2.

init-dcat-ap-de commented 12 months ago

https://semiceu.github.io//DCAT-AP/releases/3.0.0#Catalogue.themes grafik

https://semiceu.github.io/DCAT-AP/releases/3.0.0/#controlled-vocabularies-to-be-used grafik

Multiple issues with this: 1) If dcat:themeTaxonomy "must have at least the value NAL:data-theme" the cardinality should be 1..*, not 0..*. 2) I am still not convinced that there is any value added by forcing this statement. 3) Since "controlled vocabularies (..) MUST be used for the listed properties" there is no chance to use dcat:theme with anything else then NAL:data-theme. 4) If we can't use dcat:theme with anything else then NAL:data-theme, there is no need to allow multiple dcat:themeTaxonomy.

bertvannuffelen commented 10 months ago

@init-dcat-ap-de

1. If `dcat:themeTaxonomy` "must have at least the value NAL:data-theme" the cardinality should be `1..*`, not `0..*`.

I agree this is the right consequence.

Note that this change is only making it more explicit what was written in DCAT-AP 2.x. The next questions a related to the question whether or not we should enforce the use of a single theme codelist.

Note also that this property is expressing a constraint on dcat:theme or could be inferred from the dcat:theme requirements. It could even be inferred from the actual instance data.

Since this is information about a Catalogue and we seldom have considered in our exchanges Catalogues as first class citizens, but mostly as a minimal metalevel to support harvesting, the question can arise if this property should be used at all. Either, as you mentioned it, it is fixed and very narrowly interpreted, either it allows for any possible case (just based on the harvested data).
In both cases the upfront specification and manual maintenance of this information might be a waste of effort.

2. I am still not convinced that there is any value added by forcing this statement.

3. Since "controlled vocabularies (..) MUST be used for the listed properties" there is no chance to use dcat:theme with anything else then NAL:data-theme.

4. If we can't use `dcat:theme` with anything else then NAL:data-theme, there is no need to allow multiple `dcat:themeTaxonomy`.

These are good points to be discussed. We enter here a complex story of theming for multiple ecosystems. In general the challenge is that if Portal A wants to use dcat:theme with NAL A and Portal B with NAL B and the same dataset has to be provided to both portals, what is then the expected behavior?

Portal A expects

ex:dataset1 dcat:theme nalA:themeA.

Portal B expects

ex:dataset1 dcat:theme nalB:themeB.

If both are provided at the same time

ex:dataset1 dcat:theme nalA:themeA.
ex:dataset1 dcat:theme nalB:themeB.

Then the portals have to update their implementations to allow themes from a codelist they do not know. And thus ignore this in their theming search and UI experience.

This is also a issue for implementers of editors because you might have constraints like: pick only one value from NAL A and only one value from NAL B. The easiest in that case is to semantically disambiguate the use of the dcat:theme based on the NAL to be used.

ex:nalAtheme rdfs:subPropertyOf dcat:theme
ex:nalBtheme rdfs:subPropertyOf dcat:theme

ex:dataset1 ex:nalAtheme nalA:themeA.
ex:dataset1 ex:nalBtheme nalB:themeB.

This approach makes themes coexist smoothly. If some catalogue would like to group them together in dcat:theme, then they have the ability by inferring this through the rdfs:subPropertyOf relationship. But that is then left to the implementers rather than forcing them to implement a disambiguation algorithm.

bertvannuffelen commented 5 months ago

The editorial aspects mentioned in this issue have been addressed. But the topic on the intepretation of additional constraints, i.e. will be taken out of this release. That discussion continues at #316.