SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

shacl/v.1.2.1 - dct:publisher does not have shape http://data.europa.eu/r5r/mdrcv#CorporateBodyRestriction #119

Closed init-dcat-ap-de closed 2 years ago

init-dcat-ap-de commented 4 years ago

We tested a few of our datasets with the EDP tool (https://www.europeandataportal.eu/shacl/) and found that some validation errors are falsely:

Example RDF: https://www.govdata.de/ckan/dataset/strassenverkehrsunfalle-in-schleswig-holstein-im-november-2007-vorlaufige-zahlen.rdf

XML:

    <dct:publisher>
      <foaf:Organization rdf:about="http://opendata.schleswig-holstein.de/organization/statistikamt-nord">
        <foaf:name>Statistisches Amt für Hamburg und Schleswig-Holstein - Anstalt des öffentlichen Rechts - (Statistikamt Nord)</foaf:name>
      </foaf:Organization>
    </dct:publisher>

Error:

    {
      "@id": "_:b11",
      "@type": "sh:ValidationResult",
      "focusNode": "http://opendata.schleswig-holstein.de/dataset/StaNord_CMS:55265",
      "resultMessage": "Value does not have shape http://data.europa.eu/r5r/mdrcv#CorporateBodyRestriction",
      "resultPath": "dct:publisher",
      "resultSeverity": "sh:Violation",
      "sourceConstraintComponent": "sh:NodeConstraintComponent",
      "sourceShape": "_:b7",
      "value": "http://opendata.schleswig-holstein.de/organization/statistikamt-nord"
    }

Possible reason/problem in the SHACL-files: A workaround would be to extract the publisher node and add all missing classes (not part of this issue) and to add him to the list, using skos:inScheme:

  <dcat:Dataset>
    <dct:publisher rdf:resource="..............."/>
  </dcat:Dataset>

<rdf:Description rdf:about="...............">
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent" />
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization" />
    <skos:inScheme rdf:resource="http://publications.europa.eu/resource/authority/corporate-body"/>
    <foaf:name>Statistisches Amt für Hamburg und Schleswig-Holstein - Anstalt des öffentlichen Rechts - (Statistikamt Nord)</foaf:name>
</rdf:Description>

But in my opinion (and that's how I'm reading the DCAT-AP specification), the publisher shouldn't be limited to that list:

The Corporate bodies NAL must be used for European institutions and a small set of international organisations. In case of other types of organisations, national, regional or local vocabularies should be used.

Background dcat-ap.de is a German variation of dcat-ap. Our goal was, that all valid dcat-ap.de files are also valid dcat-ap files. We only reduced options and added specific fields.

GovData.de is (for example) a data portal where dcat-ap.de is used. GovData.de data is also collected by the EDP. We are now trying to improve the meta data quality, especially the dcat-ap compliance (https://www.europeandataportal.eu/mqa/govdata?locale=en).

Also posted at: https://gitlab.com/european-data-portal/mqa/shacl-validation/issues/5

bertvannuffelen commented 4 years ago

This is a choice for the EDP. Not for the DCAT-AP specification. The specification states

The Corporate bodies NAL must be used for European institutions and a small set of international organisations. In case of other types of organisations, national, regional or local vocabularies should be used.

Thus the validation process should identify if institutions or organisations in the corporate bodies NAL are used with another identifier in the provided data. If so flag them.

barthanssens commented 4 years ago

See also #87

bertvannuffelen commented 2 years ago

During WG 21 Oct 2021, it has been decided not to upgrade the historic SHACL representations and focus on the new representations.