Closed volkerjaenisch closed 5 months ago
@volkerjaenisch my apologizes for the late answer.
Firstly, I would like to make the statement that "validation is complex". Validation is always a (a) check against a (b) collection of constraints in a (c) data exchange context. All 3 elements (a), (b) and (c) are subject to an agreement, which in the practice differs per case.
(a) depends on the engine, data is provided to check and the format E.g. I have seen XML based DCAT-AP compliant validation checkers which cannot differentiate between an empty string or an absent value.
(b) if you share data in a json(-ld) way then is common for coded values only exchange the code `licence:cc-by-40' while the semantics expect that this code is a LicenceDocument with additional properties. As this is background knowledge sender and reciever might have an parallell agreement in the exchange that the code is correctly modeled. And thus these constraints are not included in the validation collection.
(c) the exchange context may impact a lot: Suppose 2 parties PA and PB are sharing data with a third PC, then one assumption might be that the data provided by PA is disjoint from PB. But is that is not the case, then other validation expectations might happen.
In your case, you sketched the case for codelists. DCAT-AP takes into account the full scope of data exchange. In that case exchanging codes '2132312' is not contributing to the understanding of the data. To stimulate the code publishers to take that aspect into account, DCAT-AP sets a requirement for a human readable title (conceptschemes) and prefLabel (concepts). Now for the harvesting process, or pure data analysis this quality constraint is not relevant, however if you want as receiver build a human readable view, it becomes crucial.
The shacl constraints of DCAT-AP cannot make a combination of (a), (b) and (c) that matches everyone's situation. That is up to you; to build the right fit for purpose combination. DCAT-AP only provides the menu, not what you have selected.
Back to your concrete case: to resolve the validation errors either (1) reach out to the codelist owners to support your constraints, (2) reduce the collection of validation rules, or perform both.
Dear SEMICeu!
I am using the ITB-Validator with
dcat-ap_2.1.1_shacl_shapes.ttl and dcat-ap-de-imports.ttl
and also for comparison pySHACL with
shapes: dcat-ap_2.1.1_shacl_shapes.ttl and owl_graph: dcat-ap-de-imports.ttl
In both cases the validation reports violations in the OWL data.
Shape
leads to
Yes, the violations in the OWL data are surely there. But how does this help my software to validate the payload DCAT-AP data? I cannot change the OWL files - but at the end of the day my software has to deliver correct DCAT-AP data.
This problem stems from the fact that SHACL validators enrich the DCAT-AP payload graph with the OWL data and then validate the complete graph. But this is a technical detail and no real excuse.
My algorithm to make the data DCAP-AP compatible is to delete the properties/nodes from the graph the validator marks as a violation. Then the graph is validated again and the process iterates till no violations are found.
This leads to a sub optimal result if violations in OWL data happen. For instance
http://publications.europa.eu/resource/authority/dataset-type/NAL
has no required
prefLabel
and is thus deleted. Therefore the DCAT dataset utilizing this artifact may become invalid since it may be a required property.And this behavior is IMHO wrong. The data provider has correctly chosen the artifact
http://publications.europa.eu/resource/authority/dataset-type/NAL
. He may even be forced to use this artifact by another SHACL rule. Therefore he should not be punished by discarding his dataset due to a faulty OWL definition out of his DCAT-AP data scope.Her other examples of violations from other OWL data.
The former ones from ITB testbench, the following from pySHACL
I am new to SHACL and maybe I see things from the wrong direction or do not understand them at all.
Any help appreciated
Volker