SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

shacl - validating the use of controlled vocabularies #127

Closed aidig closed 2 years ago

aidig commented 4 years ago

Ref: https://github.com/SEMICeu/DCAT-AP/blob/master/releases/2.0.0/dcat-ap_2.0.0_shacl_shapes.ttl

Regarding the validation of the use of controlled vocabularies, might it be an idea to implement the constraints as SHACL patterns instead of the current method requiring the presense of the range class in the data graph. (One example - of many - line 392)

    ], [
        sh:class dct:LinguisticSystem ;
        sh:path dct:language ;
        sh:severity sh:Violation
    ], [

For the the property dct:language, the DCAT-AP specification states that The EU Authority Table Languages MUST be used (URI: http://publications.europa.eu/resource/authority/language), - Ch 5 Controlled Vocabularies - and perhaps the sh:pattern property could be used to verify the URI structure as suggested by Andrea Perego (https://github.com/SEMICeu/dcat-ap_shacl/wiki/Rules-of-Thumb, it does state that this is a slow process, but still..)

Suggestion (using the above property as an example)

], [
    sh:path dc:language ;
    sh:severity sh:Violation;
    sh:pattern "^http://publications.europa.eu/resource/dataset/language/" ;   
    sh:flags "i" ; 
  ], [

Thus validating the use of controlled vocabularies in this example (a slightly amended version of DCATs basic example for EU use):


ex:dataset-001
  a dcat:Dataset ;
  dct:title "Dataset 001"@en ;
  dct:title "Datasæt 001"@da ;  
  dct:description "A description of Dataset 001"@en ;
  dct:description "En beskrivelse af datasættet 001"@da ;  
  dct:accrualPeriodicity <http://publications.europa.eu/resource/authority/frequency/ANNUAL> ;
  dct:language <http://publications.europa.eu/resource/dataset/language/DAN> ;  
  dct:publisher ex:agent-001 ;
  dct:creator ex:agent-001 ;
  dct:issued "2011-12-05"^^xsd:date ;
  dct:modified "2011-12-15"^^xsd:date ;
  dcat:distribution ex:dataset-001-csv ;
  dcat:distribution ex:dataset-001-xml ;
.

ex:dataset-001-csv
  a dcat:Distribution ;
  dcat:accessURL <http://www.example.org/files/001.csv> ;
  dcat:downloadURL <http://www.example.org/files/001.csv> ;
  dct:title "CSV distribution of dataset 001"@en ;
  dct:title "CSV-distribution af datsættet 001"@da ;  
  dcat:byteSize "5120"^^xsd:decimal ;
  dcat:mediaType <https://www.iana.org/assignments/media-types/text/csv> ;
  dct:format <http://publications.europa.eu/resource/authority/file-type/csv> ;
.

ex:dataset-001-xml
  a dcat:Distribution ;
  dcat:accessURL <http://www.example.org/files/001.xml> ;
  dcat:downloadURL <http://www.example.org/files/001.xml> ;
  dct:title "XML distribution of dataset 001"@en ;
  dct:title "XML-distribution af datsættet 001"@da ;  
  dcat:byteSize "6120"^^xsd:decimal ;
  dcat:mediaType <https://www.iana.org/assignments/media-types/text/xml> ;
  dct:format <http://publications.europa.eu/resource/authority/file-type/xml> ;
.

ex:agent-001
  a foaf:Agent;
  a foaf:Organization ;
  foaf:name "Organisation 001"@da ;  
  foaf:name "Organization 001"@en .

(Related issue 'shacl - Background knowledge for validation': https://github.com/SEMICeu/DCAT-AP/issues/125)

init-dcat-ap-de commented 4 years ago

Another way would be to convert the mandatory controlled vocabularies into shacl lists and then use sh:in.

Both ways would be better than checking for the class, since even if a SHACL validator would follow the IRI for the language, it would fail:

http://publications.europa.eu/resource/authority/language/DAN doesn't have the class dct:LinguisticSystem.

bertvannuffelen commented 4 years ago

All good suggestions.

For a complete solution, the following requirements should be checked:

init-dcat-ap-de commented 4 years ago

Maybe we need to differentiate:

If it is allowed to use others but one is obligitatory, the SHACL rule has to be constructed in a way, that at least one of the values has to use the obligitatory vocabulary.

bertvannuffelen commented 2 years ago

The DCAT-AP SHACL validator https://www.itb.ec.europa.eu/shacl/dcat-ap/upload provides an option to validate solely the membership of values in a controlled vocabulary. The configuration of the validator can be found at https://github.com/ISAITB/validator-resources-dcat-ap. This configuration refers to the SHACL template file https://github.com/SEMICeu/DCAT-AP/blob/2.1.0-draft/releases/2.1.0/dcat-ap_2.1.0_shacl_mdr-vocabularies.shape.ttl In there for each possible controlled vocabulary that we are able to validate by reusing the source representation, a shacl shape constraint is expressed. This expression is one of the possible ways to express the constraint.