RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
245 stars 63 forks source link

SHACL and subPropertyOf #141

Closed ajnelson-nist closed 2 years ago

ajnelson-nist commented 2 years ago

Hi @ashleysommer ,

I am trying to write a small "Gluing" SHACL shapes graph that combines two ontologies that have their own SHACL shapes. One of the gluing points is I want to make a ClassAB that is a subclass of classA and classB. classA has a certain property, propertyA, with a minimum-cardinality constraint, sh:min 1, and classB has an analagous property, propertyB, without that cardinality constraint. My use case happens to call for, whenever propertyA is assigned, propertyB should also be assigned with the same target. (The benefits to doing this lie in the class hierarchy above and below this "Gluing point.")

This feels to me like a good opportunity to try some multiple-inheritance with classes and with properties. However, when I tried defining a subproperty propertyAB, this was not being recognized by pyshacl as a usage of propertyA or propertyB.

I think I know the answer is basically "Yes, that's right," due to this quote from @HolgerKnublauch:

... sh:paths do not look at rdfs:subPropertyOf ...

and also from not seeing the word "subProperty" appear anywhere in the SHACL specification.

I can resort to a some scripting to save a little bit of typing, and/or sh:sparql to still make the data guarantees I want to make. But, is it true that rdfs:subPropertyOf will just be ignored by SHACL?


This issue can be reproduced with the following shapes graph and instance data. The expected sh:focusNodes are the ones with the comment of XFAIL. The ones actually returned are:

shapes.ttl:

@prefix ex: <http://example.org/ontology/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:ClassA
    a
        owl:Class ,
        sh:NodeShape 
        ;
    sh:property [
        a sh:PropertyShape ;
        sh:minCount 1 ;
        sh:nodeKind sh:BlankNodeOrIRI ;
        sh:path ex:propertyA
    ] ;
    sh:targetClass ex:ClassA ;
    .

ex:ClassB
    a
        owl:Class ,
        sh:NodeShape 
        ;
    sh:property [
        a sh:PropertyShape ;
        sh:minCount 1 ;
        sh:nodeKind sh:BlankNodeOrIRI ;
        sh:path ex:propertyB
    ] ;
    sh:targetClass ex:ClassB ;
    .

ex:propertyA
    a owl:ObjectProperty ;
    .

ex:propertyB
    a owl:ObjectProperty ;
    .

ex:propertyAB
    a owl:ObjectProperty ;
    rdfs:subPropertyOf
        ex:propertyA ,
        ex:propertyB
        ;
    .

data.ttl:

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

kb:classApropertyA
    a ex:ClassA ;
    rdfs:comment "PASS" ;
    ex:propertyA [] ;
    .

kb:classApropertyAB
    a ex:ClassA ;
    rdfs:comment "PASS" ;
    ex:propertyAB [] ;
    .

kb:classApropertyB
    a ex:ClassA ;
    rdfs:comment "XFAIL" ;
    ex:propertyB [] ;
    .

kb:classBpropertyA
    a ex:ClassB ;
    rdfs:comment "XFAIL" ;
    ex:propertyA [] ;
    .

kb:classBpropertyAB
    a ex:ClassB ;
    rdfs:comment "PASS" ;
    ex:propertyAB [] ;
    .

kb:classBpropertyB
    a ex:ClassB ;
    rdfs:comment "PASS" ;
    ex:propertyB [] ;
    .

The command I used to see the failure was:

pyshacl -s shapes.ttl data.ttl
ajnelson-nist commented 2 years ago

I should add: The issue appears whether subPropertyOf is referring to one parent property or two. Here are updated shapes and data graphs. I'd like kb:classApropertyAA to pass, but it is also among the violation focus nodes.


shapes.ttl addition:

ex:propertyAA
    a owl:ObjectProperty ;
    rdfs:subPropertyOf ex:propertyA ;
    .

data.ttl addition:

kb:classApropertyAA
    a ex:ClassA ;
    rdfs:comment "PASS" ;
    ex:propertyAA [] ;
    .
HolgerKnublauch commented 2 years ago

SHACL by itself knows nothing about rdfs:subPropertyOf, but some engines may support sh:entailment

https://www.w3.org/TR/shacl/#shacl-rdfs

which would instruct the validator to make sure that RDF Schema triples are visible to the validation process. In our Jena-based platform we support http://www.w3.org/ns/entailment/RDFS but the spec is rather relaxed about this, entirely optional feature.

ajnelson-nist commented 2 years ago

Thank you, @HolgerKnublauch . I did not know about sh:entailment. I chatted about this in a committee meeting this morning, and someone was quick to find this is a feature known to not be supported yet in pySHACL, per FEATURES.md.

Relatedly, it didn't occur to me to turn on RDFS inferencing with pyshacl's -i flag. Unfortunately, the results for my above examples didn't change when I did. So, maybe this is a bug in the owlrl package's RDFS expansion?

Running the RDFSClosure, I did not see the triple kb:classApropertyAA ex:propertyA $x ($x a skolemization for redundant reference to that blank node I used) get generated for that above sample.

Is this actually a bug in owlrl? My understanding of the RDFS semantics document Section 9.2.1, entailment pattern "rdfs7", is that the triple using the superproperty should have been generated. This test, that I'm willing to add into the OWL-RL repository if found helpful, shows that that triple is not generated. The fourth assertion fails.

(Implementation aside: I modified the data graph to use a named node kb:target everywhere I was using a blank node.)

#!/usr/bin/env python3

# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.

import owlrl
import rdflib

def test_superproperty_entailment() -> None:
    NS_EX = rdflib.Namespace("http://example.org/ontology/")
    NS_KB = rdflib.Namespace("http://example.org/kb/")

    graph = rdflib.Graph()
    graph.parse("data.ttl")

    assert (NS_KB.classApropertyAA, NS_EX.propertyAA, NS_KB.target) in graph
    assert not (NS_KB.classApropertyAA, NS_EX.propertyA, NS_KB.target) in graph

    closure = owlrl.RDFSClosure.RDFS_Semantics(graph, True, True, True)
    closure.closure()

    assert (NS_KB.classApropertyAA, NS_EX.propertyAA, NS_KB.target) in graph
    assert (NS_KB.classApropertyAA, NS_EX.propertyA, NS_KB.target) in graph
ajnelson-nist commented 2 years ago

@nicholascar , do you happen to know if this is a bug?

ashleysommer commented 2 years ago

Hi @ajnelson-nist I was having some trouble following your example, so I re-wrote it using a concrete example, to demonstrate the issue. See it here: https://gist.github.com/ashleysommer/155b94b36eaa3f5786f52d57d9d42e7b

After getting a better understanding of the problem, I can address some of your questions.

I think I know the answer is basically "Yes, that's right," due to https://github.com/RDFLib/pySHACL/issues/80#issuecomment-868012439 from @HolgerKnublauch:

Correct. PySHACL is behaving as expected in the example, because sh:path does not follow rdfs:subPropertyOf relations. Perhaps that could be made more obvious in potential future SHACL spec revisions.

it didn't occur to me to turn on RDFS inferencing with pyshacl's -i flag. Unfortunately, the results for my above examples didn't change when I did.

My first thought was that "rdfs" style inferencing built into pyshacl should work to solve your issue, but as you have seen, and I confirmed, the subPropertyOf entailment doesn't seem to be working. But it is not a bug. I know where your problem lies.

The RDFS interencing occurs on the data graph only, based on what is known about the data graph at the time of interencing. However your RDFS relationships (ie, rdfs:subPropertyOf) are not defined in the data graph. So the inferencing engine does not know how to expand the data graph.

Similarly, having the RDFS relationships defined in your SHACL Shapes file is not doing anything, because the shapes file is not expanded by the inferencer.

For a simple test, move the relevant RDFS relationship definitions from the SHACL shape file into the data graph, (and turn on inferencing with -i "rdfs") and you will see your example works as expected. (See this second linked file for the worked example: https://gist.github.com/ashleysommer/a8d34c8693b659eabae82564b6c379d7)

I understand that having these triples defined in your data graph is less than ideal, because they are not data, and it doesn't make sense for them to be in the graph at runtime. So it is for that reason many users of PySHACL adopt the 3-file system. That is - A SHACL Shape file, An Extra Ontology file, and the Data file. PySHACL has the ability to "mix-in" extra ontological definitions into the Data graph at runtime, before running the inferencing engine, it is to solve this exact issue.

For example, create a new file called "ontology.ttl" with the contents:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix ex: <http://example.org/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:ClassA a owl:Class .
ex:ClassB a owl:Class .
ex:propertyA a owl:ObjectProperty .
ex:propertyB a owl:ObjectProperty .

ex:propertyAB   a owl:ObjectProperty ;
    rdfs:subPropertyOf ex:propertyA , ex:propertyB .

ex:propertyAA   a owl:ObjectProperty ;
    rdfs:subPropertyOf ex:propertyA .

Then run pyshacl with the -e argument. Like:

pyshacl -s shapes.ttl -e ontology.ttl -i rdfs data.ttl

Then you don't need to have those RDFS definitions in your datagraph, PySHACL will add them in for you.

Finally, if you're thinking "Why do I need another ontology file? The shapes.ttl file is my ontology. The SHACL shapes are part of my ontology." then you can simply use that same file with the -e argument.

pyshacl -s shapes.ttl -e shapes.ttl -i rdfs data.ttl

In that case, the whole shapes file will be mixed into the datagraph before inferencing, and you will see your expected output.

nicholascar commented 2 years ago

"...it is for that reason many users of PySHACL adopt the 3-file system. That is - A SHACL Shape file, An Extra Ontology file, and the Data file."

This is what I do in these scenarios. Dear SHACL engine, here are some other rules for my data that are outside of your domain so please apply them first and then do your SHACL thing!

Thanks for the full investigation @ashleysommer!

ajnelson-nist commented 2 years ago

@ashleysommer thank you. I'd forgotten about the distinction between shapes files and ontology files. I followed your rearrangement suggestions, and got the results I was expecting.

I'm now a bit confused by the rdfs:subPropertyOf issue that I saw with owlrl, as I think I had seen the erroneous behavior, yet RDFS expansion worked with your instructions. I'll open a separate ticket about that with a smaller reproducing case on the owlrl tracker, as it seems to be out of scope of this Issue.

Thank you again for the assistance.

ashleysommer commented 2 years ago

I'm now a bit confused by the rdfs:subPropertyOf issue that I saw with owlrl, as I think I had seen the erroneous behavior.

I think it is because in your example, you are still using the "data.ttl" file from earlier. That does not have the rdfs:subPropertyOf relationships defined in it. (unless you added them yourself to the data.ttl file before running that test).

ajnelson-nist commented 2 years ago

Aaaand looking over how I coded the test, you're right, I forgot to load the shapes file. I added the shapes (and ontology) files, and the test passes.

This issue is well and truly closed now. No bug to report, @nicholascar . Thanks again, @ashleysommer !