RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
245 stars 63 forks source link

pySHACL not validating object property instance ranges? #176

Closed tduval-unifylogic closed 1 year ago

tduval-unifylogic commented 1 year ago

Greetings. I really like pySHACL, but I have a small issue. It's likely I'm doing something incorrect. For some reason, I looked at issue #40 which looks like it has range validation in it, and attempted to recreate with a simple ontology and data graph, but I am not getting a validation error on :event2's range?

Any insight would be helpful. Thank you.

Here is my code:

Python 3.9.6 rdflib 6.1.1 pySHACL 0.20.0

from rdflib import Graph
from pyshacl import validate
onto = """
    @prefix schema: <http://schema.org/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .

    schema:Organization a owl:Class .
     schema:Person a owl:Class, sh:NodeShape .

    schema:Event a owl:Class, sh:NodeShape ;
        sh:property [ 
            sh:path schema:attendee ;
            sh:node schema:Person           
        ] .

    schema:attendee a owl:ObjectProperty ;
        rdfs:domain schema:Event ;
        rdfs:range schema:Person 
    .
"""
og = Graph().parse(data=onto, format='turtle')
data = """
    @prefix schema: <http://schema.org/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    @prefix : <http://example.org/> .   

    :person1 rdf:type schema:Person .
    :organization1 rdf:type schema:Organization .

    :event1 rdf:type schema:Event ;
        schema:attendee :person1 .
    :event2 rdf:type schema:Event ;
        schema:attendee :organization1 .

"""
data_graph = Graph().parse(data=data, format='turtle')

r = validate(data_graph,
      shacl_graph=og,
      ont_graph=og,
      inference='both',
      abort_on_first=False,
      allow_infos=False,
      allow_warnings=False,
      meta_shacl=True,
      advanced=False,
      js=False,
      debug=False)
conforms, results_graph, results_text = r
print(conforms)

Here is the result I get:

image
ashleysommer commented 1 year ago

Hi @tduval-unifylogic PySHACL only validates data against SHACL constraints. rdfs:range is helpful for describing a RDFS Schema or Ontology, but it is not a SHACL constraint, and PySHACL ignores it in this example.

To achieve what you want in your example here, you need the SHACL ClassConstraintComponent, it validates that a given property is an instance of Person or instance of subclass of Person.

A working example shapefile would look something like this:

    @prefix schema: <http://schema.org/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .

    schema:Organization a owl:Class .
    schema:attendee a owl:ObjectProperty ;

    schema:Event a owl:Class, sh:NodeShape ;
        sh:name "AttendeeIsPerson" ;
        sh:description "Validates that an event's attendees are instance of class Person." ;
        sh:property [ 
            sh:path schema:attendee ;
            sh:class schema:Person           
        ] .

    schema:Person a owl:Class, sh:NodeShape ;
        sh:name "AttendeeOnEvent" ;
        sh:description "Validates that this person was attendee of instance of Event." ;
        sh:property [ 
            sh:path [ sh:inversePath schema:attendee  ] ;
            sh:class schema:Event           
        ] .

Or if you want to more accurately match the semantics of rdfs:domain and rdfs:range, instead of targeting the classes, you can target using the attendee property itself, like this:

    @prefix schema: <http://schema.org/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .

    schema:Organization a owl:Class .
    schema:Person a owl:Class .
    schema:Event a owl:Class .
    schema:attendee a owl:ObjectProperty ;

    schema:attendeeIsPerson a sh:NodeShape ;
        sh:targetObjectsOf schema:attendee ;
        sh:description "Validates that an event's attendees are instance of class Person." ;
        sh:class schema:Person   .   

    schema:attendeeOnEvent a sh:NodeShape ;
        sh:targetSubjectsOf schema:attendee ;
        sh:description "Validates that this person was attendee of instance of Event." ;
        sh:class schema:Event   .
ajnelson-nist commented 1 year ago

@ashleysommer : I think @tduval-unifylogic 's example isn't ignored by SHACL. There were three things that look to me like they interact and cause the "incorrect" (to eyes) passing validation result.

The three things are these two triples:

schema:attendee rdfs:range schema:Person .

:event1 schema:attendee :organization1 .

and the inference='both' parameter on the call to validate.

If inference were none, the example would have raised a validation error. But with inference='both', an additional triple is inferred/expanded/entailed, sufficiently from either RDFS entailment or OWL entailment (IIRC, either entailment scheme would cause this; at least RDFS does):

:organization1 a schema:Person .

So, the SHACL property shape from @tduval-unifylogic is satisfied after RDFS (and/or OWL) entailment has occurred.

ajnelson-nist commented 1 year ago

(This next comment is just extra fun on top of the diagnostics.)

I poked around the schema.org documentation, and found that while they have an OWL "render" of the schema.org vocabulary available (see this page, end of "Experimental" section), it includes no OWL disjointedness statements. So, it is OWL-consistent with schema.org that you could have a thing that is both a schema:Person and schema:Organization.

For further reference, here is the OWL definition of schema:attendee from their schema, version 15.0.

schema:attendee
        a owl:ObjectProperty ;
        rdfs:label "attendee"@en ;
        rdfs:comment "A person or organization attending the event."@en ;
        rdfs:domain [   
                a owl:Class ;
                owl:unionOf (
                        schema:Event
                ) ;
        ] ;     
        rdfs:isDefinedBy schema:attendee ;
        rdfs:range [    
                a owl:Class ;
                owl:unionOf (
                        schema:Organization
                        schema:Person
                        schema:Text
                        schema:URL
                        schema:Role
                ) ;
        ] ;
        .

Note that there is a difference in rdfs:range between what schema.org provides and what was in the initial example. (And, hm, there also appears to be a bug somewhere that mixed in a few classes in the range.) The non-OWL definition of schema:attendee avoids the rdfs:range conflict by using schema:rangeIncludes instead, which avoids RDFS entailment issues, but means schema.org needs to define their own entailment system:

schema:attendee
    a rdf:Property ;
    rdfs:label "attendee" ;
    rdfs:comment "A person or organization attending the event." ;
    schema:domainIncludes schema:Event ;
    schema:rangeIncludes
        schema:Organization,
        schema:Person
        ;
    .

One last point on disjointedness and RDFS expansion: schema.org lacking owl:disjointWith statements entirely means it is also OWL-consistent with the schema.org vocabulary that you could have a thing that is both a schema:Person and schema:AMRadioChannel. rdfs:domain and rdfs:range could somehow end up causing such an inference. That is one reason to choose among methods for how to encode your properties (rdfs:range? schema:rangeIncludes? http://purl.org/dc/dcam/rangeIncludes?), and whether your model should include ontological practices that include some foundational disjoint classes, as well as a mechanism for detecting OWL consistency (namely that no individual is a member of two disjoint classes).

tduval-unifylogic commented 1 year ago

@ajnelson-nist thanks so much for your explanation! It makes total sense now and I am getting the result I expect...

Shameless plug (since you folks are into reasoners): Please check out the recently published N3 Builtin Functions Documentation: https://domel.github.io/n3builtins/specification/

ajnelson-nist commented 1 year ago

@tduval-unifylogic you're welcome.

Meanwhile, I've filed a bug on the range expansion issue that came up in discussion, here.

ashleysommer commented 1 year ago

@ajnelson-nist @tduval-unifylogic

There were three things that look to me like they interact and cause the "incorrect" (to eyes) passing validation result. If inference were none, the example would have raised a validation error.

I agree with you that there is an issue with rdfs:range and rdfs:domain in schema.org, and it is something they need to address.

However it doesn't change my original answer. There are only two SHACL Shapes in the example given. One is on schema:Event and the other is on schema:Person. The first shape has only one constraint, it is the sh:property constraint, and it triggers the second shape schema:Person, that has no SHACL constraints defined on it. The rdfs:range and rdfs:domain declarations are not SHACL constraints, and are not used by PySHACL.

PySHACL sees no constraints to run, so the result is a passing validation result. Whether 'rdfs' inferencing is used or not is irrelevant, because when running this example in a python debugger and stepping through the code, it is easy to see the validator exists early with a passing validation result after finding no constraints to test.

ajnelson-nist commented 1 year ago

@ashleysommer : Ah! You're right. I saw this:

    schema:Event a owl:Class, sh:NodeShape ;
        sh:property [ 
            sh:path schema:attendee ;
            sh:node schema:Person           
        ] .

and missed that the PropertyShape is using sh:node. I thought it said sh:class. So, that was a misread on my part, and the basis for the rest of my remark. Sorry for the confusion.