RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

PyShacl generating validation errors against Shape Graph generated from Data Graph #234

Closed amnag closed 1 week ago

amnag commented 3 weeks ago

Hi, I am generating a shapes graph file from a data graph file using https://github.com/sparna-git/shacl-play by using the command line https://github.com/sparna-git/shacl-play/wiki/Run-SHACL-Play-App-from-command-line java -jar shacl-play-app-0.8.0-onejar.jar generate --input dataGraph.ttl --output shapesGraph.ttl

With this shapes file, I next try to validate the data graph.

If I run pyshacl -s shapesGraph.ttl -m -i rdfs -a -j -f human dataGraph.ttl -o validation_report.out I get a significant number of validation errors.

According to pyshacl documentation, providing the path to the shapes file is optional. If I run pyshacl -m -i rdfs -a -j -f human dataGraph.ttl -o validation_report.out then the data graph conforms as True and there are no validation errors. I also tried running the same command without the -m -i rdfs -a -j options and I still get no validation errors. Thus whether I get validation errors or not depends on whether I provide the shapes graph file or not.

1) Can you please help me understand why this discrepancy is occurring ? 2) Ideally, if I create a shapes graph from a data graph, then wouldn't validation of the data graph against the shapes graph not expected to yield any validation error ? 3) Is there a way pyshacl can be used to automate the generation of a shapes graph from a data graph ?

Hi @ashleysommer, would really appreciate your help with this issue. Thanks in advance, @amnag.

ashleysommer commented 3 weeks ago

Hi @amnag First a correct on terminology. A "Validation Error" (or Validation Failure) is a runtime error within the SHACL validator, it encounters a condition where it cannot perform the shape constraint checks. It does not indicate non-conformant data in the data graph.

Data nodes that don't conform to shapes in the Shapes graph are indicated by a "validation result" entry that is added to the validation report.

I assume you're talking about "validation results", not "validation failures" in the paragraphs above.

If I run pyshacl -m -i rdfs -a -j -f human dataGraph.ttl -o validation_report.out then the data graph conforms as True and there are no validation errors. Thus whether I get validation errors or not depends on whether I provide the shapes graph file or not.

Yes, this is expected. A validation determines a successful result (a "conformant" datagraph) if there are no "Validation Result" entries generated in the resulting validation report. If you don't provide a shapes file, then there are no constraints to check against, so no "validation results" are generated, so the datagraph is determined to be "conformant". That is correct as per the SHACL spec and is how all SHACL validators work.

Ideally, if I create a shapes graph from a data graph, then wouldn't validation of the data graph against the shapes graph not expected to yield any validation error ?

I've never used that shaps-graph-from-data-graph tool before, so I don't know how it generates shapes files, or what SHACL features and patterns it uses. But you're right, I would expect it to pass validation if you use a SHACL shapes file that was generated from the very datagraph you're validating it against.

I get a significant number of validation errors.

Again, I assume you're referring to "validation results", not actually "Validation Errors".

I can't offer any help to solve this if you haven't provided details of the output you saw, or even the SHACL file and data graph you're using.

Is there a way pyshacl can be used to automate the generation of a shapes graph from a data graph ?

No, that is not what PySHACL is for.

ashleysommer commented 2 weeks ago

@amnag Can this issue be closed now?

amnag commented 1 week ago

Hi @ashleysommer, thank you very much !