SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

shacl/v.1.2.1 - If somethint is a dcat:landingPage, it should be implied that it's a document. #116

Closed init-dcat-ap-de closed 2 years ago

init-dcat-ap-de commented 4 years ago

We tested a few of our datasets with the EDP tool (https://www.europeandataportal.eu/shacl/) and found that some validation errors are falsely:

Example RDF: https://www.govdata.de/ckan/dataset/strassenverkehrsunfalle-in-schleswig-holstein-im-november-2007-vorlaufige-zahlen.rdf

XML:

<dcat:landingPage rdf:resource="http://www.statistik-nord.de"/>

Error:

    {
      "@id": "_:b2",
      "@type": "sh:ValidationResult",
      "focusNode": "http://opendata.schleswig-holstein.de/dataset/StaNord_CMS:55265",
      "resultMessage": "Value does not have class http://xmlns.com/foaf/0.1/Document",
      "resultPath": "dcat:landingPage",
      "resultSeverity": "sh:Violation",
      "sourceConstraintComponent": "sh:ClassConstraintComponent",
      "sourceShape": "_:b3",
      "value": "http://www.statistik-nord.de"
    }

Possible reason/problem in the SHACL-files: The resource of the landingPage should be expected expected to be an foaf:Document, if nothings else is said. Otherwise we would have to add something like the following to each RDF:

<rdf:Description rdf:about="http://www.statistik-nord.de">
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document" />
</rdf:Description>

Background dcat-ap.de is a German variation of dcat-ap. Our goal was, that all valid dcat-ap.de files are also valid dcat-ap files. We only reduced options and added specific fields.

GovData.de is (for example) a data portal where dcat-ap.de is used. GovData.de data is also collected by the EDP. We are now trying to improve the meta data quality, especially the dcat-ap compliance (https://www.europeandataportal.eu/mqa/govdata?locale=en).

Also posted at: https://gitlab.com/european-data-portal/mqa/shacl-validation/issues/2

bertvannuffelen commented 4 years ago

According to the specification of dcat:landingPage ( https://www.w3.org/TR/vocab-dcat-2/#Property:resource_landing_page ), the range is foaf:Document.

Here an agreement must be made about the class membership and the severity of the violation.

On the one hand, one can argument that the data provider must explicitly declare the class membership to ensure the awareness of the expected interpretation of the provided value. This approach corresponds to XML messages where the tags are named after the class membership. On the other hand, the value provided can be assumed to be member of the desired class. And therefore it is not necessary at all to check the membership.

Note that the last option in a universal reasoning system might lead to conflicts, because one could introduce information that the provided value is also a foaf:Project (because of some other statements on the web). And based on the foaf specification this is inconsistent. But is most cases the these complex situations won't occur, and mostly the class membership could be safely assumed.

Finally the choice what the severity of the error is also bound to the usage. If the receiving system, will query for all documents ?s a foaf:Document then the constraint must be included, otherwise it could be omitted.

Given that there is no clear correct choice and since DCAT-AP is application neutral specification it is best to make the SHACL constraints as close to the constraints expressed in the specification. But of-course, cases as this one might influence how the SHACL constaints are expressed, their severity and the modularisation that is being used. A good organisation might be assist the creation of implementation specific validation processes that yield the desired semantics and satisfy the expressed constraints in the specification.

init-dcat-ap-de commented 4 years ago

Given that there is no clear correct choice and since DCAT-AP is application neutral specification it is best to make the SHACL constraints as close to the constraints expressed in the specification.

I would argue the other way around. Given that DCAT-AP is an application neutral specification it is best to make the SHACL constrains as loose to the constraints expressed in the specification as possible.

It is easier to reuse the SHACL shapes provided by the SEMIC and then add application specific constraints on top. Stripping the provided shapes from all unneccessary constraints leads to a complete rewrite.

bertvannuffelen commented 2 years ago

During WG 21 Oct 2021, it has been decided not to upgrade the historic SHACL representations and focus on the new representations.