RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
242 stars 63 forks source link

Validator runtime error #40

Closed JoelBender closed 4 years ago

JoelBender commented 4 years ago

Validator says there is a runtime error, no additional details turning on debug:

$ pyshacl -s 03-Network.ttl -e 03-Network.ttl sample-network.ttl
Validator encountered a Runtime Error.

Info:

$ python3.6
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyshacl
INFO:rdflib:RDFLib Version: 4.2.2
>>> print(pyshacl.__version__)
0.11.3.post1

Turtle files attached as text files. 03-Network.ttl.txt sample-network.ttl.txt

ashleysommer commented 4 years ago

Hi @JoelBender I've determined this error is caused by the use of a Recursive Shape. See the comments about recursive shapes in the SHACL spec here.

The bacnet:Node shape is recursive, because it has property constraints (:layeredAbove and :layeredBelow) where each target in the property-path must themselves conform to the bacnet:Node shape. That causes pyshacl to go into a loop, when checking conformance of node :b and :c in sample-network. Eg, :b :layeredAbove :c -> :c :layeredBelow :b -> :b :layeredAbove :c -> ... until it crashes.

Honestly I'm surprised this is the first time I've come across this issue. And I'm surprised there isn't an example of this in the Standard SHACL Test-Suite. We'll have to decide what to do about it, and it may be fixed in a future pySHACL release.

JoelBender commented 4 years ago

Thank you. I suspect that they didn't want to include tests that would break implementations in an effort to encourage adoption, maybe this will turn out to be an O(n ** n) problem...heh! In the mean time, I'll remove the :layeredAbove for now, it's not as important as the :layeredBelow. The end result will be something like the SEAS SystemOntology describing systems connected through network services that are in protocol stacks.

ashleysommer commented 4 years ago

Hi @JoelBender This issue piqued my interest enough to spend a few hours on it. I've come up with a fix.

I have added three new features: 1) pyshacl now tracks its evaluation path between Shapes as it's traversing between shapes during validation execution. 2) Using that evaluation path, after a certain depth it can try to detect if there are loops happening, and break out of them 3) If the evaluation path gets too deep, even without loops, it deliberately errors out, this prevents a different kind of recursion bug.

I've released a new version (0.11.4), I've verified it now works with your example files above.

Note: It does spit out some pretty verbose warnings when it encounters a recursion loop, and it appears to find the same loop multiple times in different ways during execution. However the validation result it returns is correct, despite the warnings.

ashleysommer commented 4 years ago

A final note, I tried your example files with "owlrl" inferencing turn on, and I am getting some strange validation non-conformance reports. I'll have to make a note to look into that, because that example should (if I understand it correctly) validate as conformant.

ashleysommer commented 4 years ago

@JoelBender something else I was thinking...

Did you have a tool generate the 03-Network.ttl SHACL Shape file or did you write it by hand?

I ask that, because I realized why I've never seen this issue come up before. Normally you wouldn't use the sh:node constraint in the way you have.

(aside, the use of the term "Node" both as a type of constraint in SHACL, and as a class type in your Network topology ontology makes this more confusing, but bare with me).

All of the bacnet:Node instances in the graph should be conformant on their own using the bacnet:Node shape, there's no need to force additional conformance when they are encountered on a path. Eg, you could replace:

:Node
    rdf:type                owl:Class, sh:NodeShape ;
    rdfs:comment            "A node is a member of a network with an address." ;
    sh:targetClass          :Node ;
    sh:property [ 
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasAddress ] ;
    sh:property [ sh:node :Network ;
            sh:path :isNodeOf ] ;
    sh:property [ sh:node :Port ;
            sh:path :portReference ] ;
    sh:property [ sh:node :Node ;
            sh:path :layeredAbove ] ;
    sh:property [ sh:node :Node ;
            sh:path :layeredBelow ] .

with simple class-type checks, like this:

:Node
    rdf:type                owl:Class, sh:NodeShape ;
    rdfs:comment            "A node is a member of a network with an address." ;
    sh:targetClass          :Node ;
    sh:property [ 
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasAddress ] ;
    sh:property [ sh:class :Network ;
            sh:path :isNodeOf ] ;
    sh:property [ sh:class :Port ;
            sh:path :portReference ] ;
    sh:property [ sh:class :Node ;
            sh:path :layeredAbove ] ;
    sh:property [ sh:class :Node ;
            sh:path :layeredBelow ] .

This makes the validation simpler, and removes the Shape Recursion. However you lose the ability to put anonymous blank nodes on :layeredAbove and :layeredBelow, etc, unless you specify those blank nodes rdf:type bacnet:Node.

Another observation, with patterns like this:

:Port
    rdf:type            owl:Class, sh:NodeShape ;
    rdfs:comment        "A port is a connection point used to connect one of more devices together." ;
    sh:targetClass      :Port .

You don't need sh:targetClass in there. Because this class is both a sh:NodeShape and an owl:Class, it becomes an Implicit-Class-Target shape, where the NodeShape applies to bacnet:Port even without the sh:targetClass being added, but having it there isn't hurting anything anyway, see here for more details.

JoelBender commented 4 years ago

Yep, written by hand until the bugs are worked out, then I take the pattern and auto-generate lots of them, usually from another schema "language". I started including sh:targetClass with the idea that "explicit is better than implicit" and it worked around some problem I had attempting to validate documents that had not been though the OWLRL or RDFS reasoner first. I think this is also related to the reason I need to add both -s and -e options, but I can't be sure. You can probably put sh:node and sh:class in the same bucket, because the shapes won't validate until after some reasoner has decided that the owl:Class is actually a sh:NodeShape.

The use of :Node is unfortunate, but in my domain "networks" are made up of "nodes," and "objects" that have "properties," so it is what it is. It's going to get worse with :System.

ashleysommer commented 4 years ago

the idea that "explicit is better than implicit"

Thats fair, keeping the explicit targetClass is fine.

The reason you need to use both -s and -e is because you are using your SHACL Shapes file as both the source of the Shapes, and for your ontological class relationships.

When you expand your data graph using "rdfs" or "owlrl" inferencing, the reasoner will ignore the shacl graph passed in as -s. It needs all OWL and RDFS class relationships available to the reasoner in the data graph in order to apply those inferences. That is where the "extra ontology" options -e comes in. You can pass in another RDF file which contains your OWL and RDFS relationships, and they get mixed-in to the Data Graph before the reasoner runs the inferencing step.

In your case, the SHACL shapes file and the OWL/RDFS relationships ontology file are the same file, thats why you need to pass it to both -s and to -e.

shapes won't validate until after some reasoner has decided that the owl:Class is actually a sh:NodeShape

I don't think I follow what you mean there. There is no reasoner ever applied to the SHACL Shapes Graph during any pyshacl run.

ashleysommer commented 4 years ago

closing as resolved