RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
246 stars 63 forks source link

sh:or inside sh:not #74

Closed jbkoh closed 3 years ago

jbkoh commented 3 years ago

Hi there, I'm trying to implement validation like owl:disjointWith in SHACL. To do that, I'm applying sh:not with sh:or as in the below example.

shape graph

@prefix ex: <https://example.com#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

ex:AAA a owl:Class.
ex:BBB a owl:Class.
ex:CCC a owl:Class.

ex:Shape a sh:NodeShape ;
    sh:targetClass ex:AAA;
    sh:not [
        a sh:PropertyShape ;
        sh:path rdf:type;
        sh:or (
            [sh:hasValue ex:BBB;]
            [sh:hasValue ex:CCC;]
        )
    ];
    .

However, the data graph with an instance whose type is both AAA and BBB is not detected as not conforming.

The validation code with data graph:

import rdflib
import pyshacl

shape_g = rdflib.Graph().parse('sample_shape.ttl', format='turtle')
data_g  = rdflib.Graph().parse(data="""
@prefix ex: <https://example.com#> .

ex:aaa a ex:AAA.
ex:aaa a ex:BBB.
ex:aaa a ex:CCC.
""", format='turtle')

conforms, results_graph, results_text = pyshacl.validate(
    data_g, shacl_graph=shape_g,
)
assert not conforms

This is failing by not generating any validation errors. The shape works without sh:or, so I'm curious whether I misunderstood the concept of sh:or or there is a possible bug.

Any input would be appreciated. Thanks a lot.

ashleysommer commented 3 years ago

Hi @jbkoh I'm going to attempt a replication and investigation of this now. I'll let you know what I find.

ashleysommer commented 3 years ago

Ok, I've looked into it, and it looks like its not a bug in PySHACL. WARNING: long reply! This will probably contain minor errors, and I will probably go back and fix some paragraphs later.

For sanity checking, I ran the same validation through the online SHACL Playground, which uses a completely independent SHACL engine called shacl-js, and it gives the same result that PySHACL does.

The main problem is to do with how PropertyShapes interact with logical constraints like sh:not and sh:or. In short, it is very difficult to invert the result of a PropertyShape (using sh:not) and get the outcome you are expecting.

Stepping it through, the PropertyShape with sh:path ("rdf:type") generates a focusNode of ex:aaa and valueNodes of (ex:AAA, ex:BBB, ex:CCC). sh:or is what is termed a "Shape-expecting constraint". That means it contains within it a set of shapes, and executes those shapes repeatedly for every valueNode that is passed into it. So these values nodes then get passed into the sh:or constraint to test against the sh:hasValue constraints. It goes like this: ex:AAA sh:hasValue ex:BBB -> False ex:AAA sh:hasValue ex:CCC -> False Both of those or'd together = False.

ex:BBB sh:hasValue ex:BBB -> True ex:BBB sh:hasValue ex:CCC -> False Both of those or'd together = True.

ex:CCC sh:hasValue ex:BBB -> False ex:CCC sh:hasValue ex:CCC -> True Both of those or'd together = True.

That means that even though the second and third valueNode passes the sh:or check, the first one never does, which means the overall validation result for the PropertyShape itself is False.

Then the non-comformant PropertyShape gets inverted by sh:not and the whole NodeShape passes as valid, that is the result we are seeing in both PySHACL and in the SHACL Playground.

Hacking in some debugging output into the PySHACL code, I generated this output which might help to illustrate the problem:

[<Shape p=False node=https://example.com#Shape>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL15C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL15C13>] Fails
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL16C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL16C13>] Fails
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL15C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL15C13>] Fails
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL16C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL16C13>] Passes
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL15C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL15C13>] Passes
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL16C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>, <OrConstraintComponent>, <Shape p=False node=ub1bL16C13>] Fails
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=True node=ub1bL11C12>] Fails
[<Shape p=False node=https://example.com#Shape>] Passes

You can see the sh:hasValue constraint is executed 6 times (twice for each valueNode of sh:path). This is not what we want.

There are however a couple of different ways we can achieve what you're trying to do: 1) Simplest change - Move the property shapes into the sh:or

ex:Shape a sh:NodeShape ;
    sh:targetClass ex:AAA;
    sh:not [
        sh:or (
            [sh:path rdf:type; sh:hasValue ex:BBB;]
            [sh:path rdf:type; sh:hasValue ex:CCC;]
        )
    ]  .

sh:hasValue is not a "Shape-Expecting constraint". It has no child shapes. In this case, having the sh:hasValue on the same shape as sh:path, means it can take all of the valueNodes at the same time, and produce a single result: (ex:AAA, ex:BBB, ex:CCC) sh:hasValue ex:BBB = True (ex:AAA,, ex:BBB, ex:CCC) sh:hasValue ex:CCC = True Both of these OR'd together = True Invert that with NOT, you get False, which is what we are looking for. This time the debug output looks like this:

[<Shape p=False node=https://example.com#Shape>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=False node=ub1bL21C12>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=False node=ub1bL21C12>, <OrConstraintComponent>, <Shape p=True node=ub1bL24C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=False node=ub1bL21C12>, <OrConstraintComponent>, <Shape p=True node=ub1bL24C13>] Passes
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=False node=ub1bL21C12>, <OrConstraintComponent>, <Shape p=True node=ub1bL23C13>] Start
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=False node=ub1bL21C12>, <OrConstraintComponent>, <Shape p=True node=ub1bL23C13>] Passes
[<Shape p=False node=https://example.com#Shape>, <NotConstraintComponent>, <Shape p=False node=ub1bL21C12>] Passes
[<Shape p=False node=https://example.com#Shape>] Fails

You can see this time sh:hasValue is only executed twice, which is all is needed to determine conformance.

2) Similar to the above, but flipping some logic around. Move the sh:not into the sh:or, but change sh:or to sh:and.

ex:Shape a sh:NodeShape ;
    sh:targetClass ex:AAA;
    sh:and (
        [sh:not [sh:path rdf:type; sh:hasValue ex:BBB;]]
        [sh:not [sh:path rdf:type; sh:hasValue ex:CCC;]]
    ).

This is exactly the same logic as the above, and gives the same result, but due to flipped sh:and and sh:not, it is executed differently within PySHACL.

[<Shape p=False node=https://example.com#Shape>] Start
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL15C5>] Start
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL15C5>, <NotConstraintComponent>, <Shape p=True node=ub1bL15C13>] Start
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL15C5>, <NotConstraintComponent>, <Shape p=True node=ub1bL15C13>] Passes
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL15C5>] Fails
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL12C5>] Start
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL12C5>, <NotConstraintComponent>, <Shape p=True node=ub1bL12C13>] Start
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL12C5>, <NotConstraintComponent>, <Shape p=True node=ub1bL12C13>] Passes
[<Shape p=False node=https://example.com#Shape>, <AndConstraintComponent>, <Shape p=False node=ub1bL12C5>] Fails
[<Shape p=False node=https://example.com#Shape>] Fails

3) Use sh:class instead of PropertyShape and Path.

I get that there is a probably a good reason you're using sh:path rdf:type but there is the built-in sh:class mechanism in PySHACL that will do this (and it also checks one-level of subclass for free too).

ex:Shape a sh:NodeShape ;
    sh:targetClass ex:AAA;
    sh:and (
        [sh:not [sh:class ex:BBB]]
        [sh:not [sh:class ex:CCC]]
    ).

You can see this still uses the sh:and and sh:not pattern same as above, removes the need to have any PropertyShapes.

4) Complete the loop

Now we're using sh:class we don't have a PropertyShape, so we can go back to using the orignal sh:not and sh:or setup:

ex:Shape a sh:NodeShape ;
    sh:targetClass ex:AAA;
    sh:not [
      sh:or (
          [sh:class ex:BBB]
          [sh:class ex:CCC]
      )
    ].

This is probably the form I'd use in this situation.

Hope this helped!

jbkoh commented 3 years ago

Thanks a lot for the detailed answers! Learned a lot about the logic and alternatives. The key point seems to be that the rdf:type ex:AAA is not treated specially in the sh:or logic, which makes sense. Your option 4 is the most attractive to me as it's semantically what I want to represent. I just didn't know sh:class can be applied to NodeShape as well.

Again, thanks a lot for the quick and detailed response!