RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

Improve sh:node reporting #165

Closed gtfierro closed 1 year ago

gtfierro commented 1 year ago

I'm currently working on a tool that consumes a SHACL validation report graph in order to provide recommendations for what can be changed in the source graph. Most of the shapes I have use sh:node components to combine different requirements together. However, when one of these components is violated, the report is not very clear:

Constraint Violation in NodeConstraintComponent (http://www.w3.org/ns/shacl#NodeConstraintComponent):
    Severity: ns2:Violation
    Source Shape: <urn:parent_shape>
    Focus Node: <urn:ex/a>
    Value Node: <urn:ex/a>
    Message: Value does not conform to Shape <urn:child_shape_referenced_via_sh_node>

(I'm using the text output here rather than the turtle graph, but the content is basically the same). The only information about what component actually failed is inside the message string. If I do find my own custom sh:message for the parent shape, then I lose the information about the actual component that failed. I'd rather just use an actual property off of the validation result rather than have to parse through the string.

In this pull request, I just added the node shape to the generated report as the Source Constraint. Now the report looks like this:

Constraint Violation in NodeConstraintComponent (http://www.w3.org/ns/shacl#NodeConstraintComponent):
    Severity: ns2:Violation
    Source Shape: <urn:parent_shape>
    Focus Node: <urn:ex/a>
    Value Node: <urn:ex/a>
    Source Constraint: <urn:child_shape_referenced_via_sh_node>
    Message: Value does not conform to Shape <urn:child_shape_referenced_via_sh_node>

I couldn't find anything in the SHACL standar that says that this has to be given as part of the validation report, which is unfortunate, because this feels like really critical information for being able to use the report to fix your data graph.

Let me know what you think and if there's any tests you think I should write. I hope I'm not completely off-base here!


I have some rogue changes in here that I'm trying to remove. I'm only proposing the change in https://github.com/RDFLib/pySHACL/pull/165/files#diff-0bb79e7b8395a62aa47acd04ac1cdcccde2454ddf329aea91358773ee07f9781

ashleysommer commented 1 year ago

Hi @gtfierro Thanks for your efforts here. You are right that is is a big oversight that the SHACL spec does not include this feature for sh:node constraint constraint reporting, and this is something I've been meaning to enhance in PySHACL.

As you can see in the failing tests, the solution is not as simple as it seems on the surface. The validation result property sh:sourceConstraint is part of SHACL-SPARQL and is supposed to be populated by a SPARQL Constraint. In this PR you are populating this property with an instance of NodeShape, that is inconsistent with the SHACL Validation Report schema. The report schema is strictly defined, there are tests in the W3C SHACL test suite (SHT) and in the Datashapes DASH shacl shape test suite, that validate the output of the report graph to ensure it is consistent with the expected content.

By adding the reference to the NodeShape to the sh:sourceConstraint property, it is not only misleading nomenclature (a Shape and a Constraint are different things, not to be confused) but it breaks the NodeConstraintComponent tests in the test suite, causing PySHACL to be non-compliant with the requirements of the SHACL spec.

I will think about this problem more, and will come up with a cleaner solution.

gtfierro commented 1 year ago

Thanks for the detailed and prompt response, as always! Totally understand what you've outlined above. For now, I think I might explore some inlining of the constraints on sh:node so I can get useful output.

Totally missed that sourceConstraint was part of SHACL-SPARQL rather than the base standard...whoops!

ashleysommer commented 1 year ago

@gtfierro A new contributor has added a feature that is similar to what you were trying to achieve here, but using sh:detail. https://github.com/RDFLib/pySHACL/pull/185

gtfierro commented 1 year ago

Beautiful! I'm looking forward to digging into it :). Really appreciate you following up on this