Violations when using mandatory Controlled Vocabularies from the Publications Office

jimjyang commented 3 years ago

Several controlled vocabularies from the Publications Office are mandatory to use, when the corresponding properties are used. However, the values from the controlled vocabularies are not coded as instances of the classes that are specified in BRegDCAT-AP, thus violations.

For example (just as an example, since this applies to many other controlled vocabularies as well):

The range for dct:spatial is dct:Location
http://publications.europa.eu/resource/dataset/country is one of the vocabularies that are mandatory to use for dct:spatial
<> dct:spatial <http://publications.europa.eu/resource/authority/country/NOR> . gives violation because http://publications.europa.eu/resource/authority/country/NOR is not a dct:Location.

agmangas commented 3 years ago

This is indeed a recurrent issue. Currently, I can see only two options, neither of which are optimal:

Include redundant graphs into the RDF datasets to explictly provide the missing information. For example:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix cv: <http://data.europa.eu/m8g/> .

<https://joinup.ec.europa.eu/collection/access-base-registries/solution/abr-bregdcat-ap#example-public-organisation>
  a cv:PublicOrganisation ;
  dct:identifier "ae1d152b-57b7-4e78-bf6f-fbe8ceb0af06" ;
  skos:prefLabel "Administración Pública del Gobierno de España"@es ; 
  dct:spatial <http://publications.europa.eu/resource/authority/country/ESP> ; 
  dct:title "Administración pública"@es .

<http://publications.europa.eu/resource/authority/country/ESP>
  a dct:Location ; 
  skos:inScheme <http://publications.europa.eu/resource/authority/country> .

We've been using this approach. However, it increases the verbosity of BRegDCAT-AP datasets significantly.

Another option would to be to leverage the capabilities of the ITB validator to load additional complementary shapes on every validation run. The downside to this would be the additional workload involved in creating and maintaining the complementary shapes, and also some degree of redundancy and possible misalignments.

costas80 commented 3 years ago

Wanted to add my opinion on this based on prior experience. We have seen in the past other cases of wrong validation results because of missing "context" information for the validation. Such context information could be along the lines of what is highlighted here or something more structural like "A subclasses B and B subclasses C" where shapes referring to "C" cannot succeed without the additional information.

Other users of the RDF validator (for other specs) have resolved this in a way similar to what @agmangas suggests, adding basically the additional "context" as part of the shapes' graph. In the end the additional overhead of doing this is justified given that the validator's results are accurate.

To avoid such additional information "polluting" the main SHACL shapes, a good approach would be to not add them in the same RDF file but add them in a separate one. This can be added to the validator's configuration (each validation can map to any number of SHACL shape files) or included via owl:Imports. Given this specific case I think I would follow the first approach as this is more a workaround to address the SHACL "weakness".

agmangas commented 3 years ago

Thanks a lot for your input @costas80 😄

By all accounts it seems the most reasonable solution is to implement your proposal.

ISAITB / validator-resources-bregdcat-ap

Violations when using mandatory Controlled Vocabularies from the Publications Office #2