SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

shacl/v.1.2.1 - dct:publisher - foaf:Organization is also an Agent #117

Closed init-dcat-ap-de closed 2 years ago

init-dcat-ap-de commented 4 years ago

We tested a few of our datasets with the EDP tool (https://www.europeandataportal.eu/shacl/) and found that some validation errors are falsely:

Example RDF: https://www.govdata.de/ckan/dataset/strassenverkehrsunfalle-in-schleswig-holstein-im-november-2007-vorlaufige-zahlen.rdf

XML:

    <dct:publisher>
      <foaf:Organization rdf:about="http://opendata.schleswig-holstein.de/organization/statistikamt-nord">
        <foaf:name>Statistisches Amt für Hamburg und Schleswig-Holstein - Anstalt des öffentlichen Rechts - (Statistikamt Nord)</foaf:name>
      </foaf:Organization>
    </dct:publisher>

Error:

    {
      "@id": "_:b4",
      "@type": "sh:ValidationResult",
      "focusNode": "http://opendata.schleswig-holstein.de/dataset/StaNord_CMS:55265",
      "resultMessage": "Value does not have class http://xmlns.com/foaf/0.1/Agent",
      "resultPath": "dct:publisher",
      "resultSeverity": "sh:Violation",
      "sourceConstraintComponent": "sh:ClassConstraintComponent",
      "sourceShape": "_:b5",
      "value": "http://opendata.schleswig-holstein.de/organization/statistikamt-nord"
    }

Possible reason/problem in the SHACL-files: Every foaf:Organization is also an foaf:Agent, so this error should not be thrown.

Background dcat-ap.de is a German variation of dcat-ap. Our goal was, that all valid dcat-ap.de files are also valid dcat-ap files. We only reduced options and added specific fields.

GovData.de is (for example) a data portal where dcat-ap.de is used. GovData.de data is also collected by the EDP. We are now trying to improve the meta data quality, especially the dcat-ap compliance (https://www.europeandataportal.eu/mqa/govdata?locale=en).

Also posted at: https://gitlab.com/european-data-portal/mqa/shacl-validation/issues/3

bertvannuffelen commented 4 years ago

This is a typical validation situation which originates from the fact that it is unclear what information should be taken into account as background knowlegde for the validation.

In this case, the usage of the class foaf:Agent might suggest that the file behind the namespace (http://xmlns.com/foaf/spec/index.rdf) is part of the background knowledge. But one could also argument that it should be the data provider, which would like to rely on the subclass definition should provide the relationship as part of the provided data.

What option is valid, however, depends on the next step, namely the system that is going to use the data. E.g. If that system would do a query ?s a foaf:Agent to find all publishers, and it does not include reasoning based on the subclass definition, then it the validation should not include the foaf specification as background knowledge.

Given that there is no clear correct choice and since DCAT-AP is application neutral specification it is best to make the SHACL constraints as close to the constraints expressed in the specification. But of-course, cases as this one might influence how the SHACL constaints are expressed, their severity and the modularisation that is being used. A good organisation might be assist the creation of implementation specific validation processes that yield the desired semantics and satisfy the expressed constraints in the specification.

bertvannuffelen commented 2 years ago

During WG 21 Oct 2021, it has been decided not to upgrade the historic SHACL representations and focus on the new representations.