TopQuadrant / shacl

SHACL API in Java based on Apache Jena
Apache License 2.0
214 stars 61 forks source link

How to properly use RDFS inference with `sh:closed`? #101

Closed jbkoh closed 3 years ago

jbkoh commented 3 years ago

Hi! I'm trying to use SHACL for a couple of my projects. I would like to understand the relationship between rdfs:subClassOf and sh:closed better. Basically, I would like to use the closed world assumption supported by sh:closed but allowing subclasses to specify more than their superclasses' shapes. For example,

:ClassB rdfs:subClassOf :ClassA.


- The Shape graph:
```turtle
:ShapeA a sh:NodeShape;
    sh:targetClass :ClassA;
    sh:property [sh:path :propA; sh:datatype xsd:string];
    sh:closed true.

:ShapeB a sh:NodeShape;
    sh:targetClass :ClassB;
    sh:property [sh:path :propB; sh:datatype xsd:string];
    sh:property [sh:path :propA; sh:datatype xsd:string];
    sh:closed true.

:instanceB a :ClassB; :propA "valueA"; :propB "valueB".


In the RDFS logic, we can infer the following triple
```turtle
:instanceB a :ClassA.

So now, instanceB is violating ShapeA's closed assumption.

I like both 1) RDFS subclass hierarchy across the concepts I would like to model and 2) sh:closed property to easily verify the entire data set. So my desired outcome would be, superclasses's shapes' sh:closed property would be ignored in SHACL validation. I feel it's a natural modeling practice, like, if this is only an instance of ClassA, it can only have propA. If that is an instance of ClassB which is a subclass or an extension of ClassA, it can have both propA and propB.

Would there be a solution for this use case?

Thanks a lot!

irenetq commented 3 years ago

First, on the terminology. In my experience, “closed world assumption” (CWA) refers to the following two items:

  1. Negation as a failure e.g., if we do not have a value for let’s say last name of a Person, we assume that this data does not exist. Thus, if we say that sh:minCount for lastName is 1, we get a violation. With owl:minCardinality 1 restriction, we would not get a violation.
  2. Unique names e.g., we assume that resources with different URIs are different resources

Both of these things are already in SHACL. In other words, SHACL is based on the CWA.

Sh:closed is something else. It says that resources that are targeted by a shape, only have values for properties described in the shape.

SHACL engines will always do a small bit of RDFS inferences using rdf:type/rdfs;SubClassOf - as described in the spec: https://www.w3.org/TR/shacl/#terminology https://www.w3.org/TR/shacl/#terminology. :instanceB is a SHACL instance of :ClassA and, therefore, a target of :ShapeA.

If B is a subclass of A, it is not an extension of A, it is a subset of A - all instances of B are instances of A. Therefore, it is correct that all instances of B must be valid according to shapes that target instances of A. With this, a modeling approach that uses sh:closed while targeting members of a set that has subsets with additional properties you want to allow, seems peculiar to me.

There may be ways for accomplishing what you want if you insist on using sh:closed on the shapes that target classes with subclasses, but they are not straightforward.

  1. You can use sh:ignoredProperties in :ShapeA and list :propB there. However, it will then decide that any instance of A is valid if it has :propB, even if it is not an instance of B. And, of course, you would need to do it for all properties that are allowed for instances of subclasses
  2. You can use sh:or in defining :ShapeA to say that instances of A (target class) either conform to :ShapeB or to whatever you currently have defined for :ShapeA. Of course, this would require you to treat all subclasses this way and it becomes complex, especially since you already have the complexity of separating node shapes and classes. It will be more straightforward (but still some maintenance issue as you add new subclasses) if you use implicit class targets https://www.w3.org/TR/shacl/#implicit-targetClass https://www.w3.org/TR/shacl/#implicit-targetClass
  3. Possibly, sh:node could be of use. It is an alternative to using target statements, but this is specific to specifying what is valid as a value of a specific property. See https://www.w3.org/TR/shacl/#NodeConstraintComponent https://www.w3.org/TR/shacl/#NodeConstraintComponent. With sh:node, it does not matter if a value of a property is a member of multiple classes, it will only apply the identified shape.

May be some other variations.

Hope this helps,

Irene

On Oct 1, 2020, at 3:55 AM, Jason B. Koh notifications@github.com wrote:

Hi! I'm trying to use SHACL for a couple of my projects. I would like to understand the relationship between rdfs:subClassOf and sh:closed better. Basically, I would like to use the closed world assumption supported by sh:closed but allowing subclasses to specify more than their superclasses' shapes. For example,

The schema graph: :ClassA a rdfs:Class.

:ClassB rdfs:subClassOf :ClassA. The Shape graph: :ShapeA a sh:NodeShape; sh:targetClass :ClassA; sh:property [sh:path :propA; sh:datatype xsd:string]; sh:closed true.

:ShapeB a sh:NodeShape; sh:targetClass :ClassB; sh:property [sh:path :propB; sh:datatype xsd:string]; sh:property [sh:path :propA; sh:datatype xsd:string]; sh:closed true. The Data graph: :instanceA a :ClassA; :propA "valueA".

:instanceB a :ClassB; :propA "valueA"; :propB "valueB". In the RDFS logic, we can infer the following triple

:instanceB a :ClassA. So now, instanceB is violating ShapeA's closed assumption.

I like both 1) RDFS subclass hierarchy across the concepts I would like to model and 2) sh:closed property to easily verify the entire data set. So my desired outcome would be, superclasses's shapes' sh:closed property would be ignored in SHACL validation. I feel it's a natural modeling practice, like, if this is only an instance of ClassA, it can only have propA. If that is an instance of ClassB which is a subclass or an extension of ClassA, it can have both propA and propB.

Would there be a solution for this use case?

Thanks a lot!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TopQuadrant/shacl/issues/101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG762C4G4ICUKM3KCND5ELSIQYXXANCNFSM4SACLMDQ.

jbkoh commented 3 years ago

Hi Irene,

Thanks for the kind response. Everything in it was really helpful. I realized that having ClassA to be more restrictive than ClassB doesn't make sense, and your solution 1. would be the best fit for my use case.

Thank you so much!

HolgerKnublauch commented 3 years ago

Assuming you are using a SHACL-SPARQL engine (such as TopBraid's) you can use

http://datashapes.org/constraints.html#ClosedByTypesConstraintComponent

tfrancart commented 3 years ago

Hello

Le jeu. 1 oct. 2020 à 17:19, Irene Polikoff notifications@github.com a écrit :

First, on the terminology. In my experience, “closed world assumption” (CWA) refers to the following two items:

  1. Negation as a failure e.g., if we do not have a value for let’s say last name of a Person, we assume that this data does not exist. Thus, if we say that sh:minCount for lastName is 1, we get a violation. With owl:minCardinality 1 restriction, we would not get a violation.
  2. Unique names e.g., we assume that resources with different URIs are different resources

Both of these things are already in SHACL. In other words, SHACL is based on the CWA.

Sh:closed is something else. It says that resources that are targeted by a shape, only have values for properties described in the shape.

SHACL engines will always do a small bit of RDFS inferences using rdf:type/rdfs;SubClassOf - as described in the spec: https://www.w3.org/TR/shacl/#terminology < https://www.w3.org/TR/shacl/#terminology>. :instanceB is a SHACL instance of :ClassA and, therefore, a target of :ShapeA.

If B is a subclass of A, it is not an extension of A, it is a subset of A

  • all instances of B are instances of A. Therefore, it is correct that all instances of B must be valid according to shapes that target instances of A. With this, a modeling approach that uses sh:closed while targeting members of a set that has subsets with additional properties you want to allow, seems peculiar to me.

I found myself in the exact same situation. I feel the relationship between sh:closed and that "little bit of inference that SHACL engine do" is confusing. I'd like to put my own words on this :

  1. My data graph contains direct instances of A and direct instances of B ("x rdf:type A" and "y rdf:type B");
  2. I need to check that the structure of the graph is "closed", that is the set of direct instances of A only have a set of allowed properties, and direct instances of B only have another set of allowed properties;
  3. If I define a Shape that target class A and a shape that target class B, I can close each Shape and it works fine
  4. If it happens that B is a subClass of A, then, as Jason described, it does not work anymore "as I would expect";

There may be ways for accomplishing what you want if you insist on using sh:closed on the shapes that target classes with subclasses, but they are not straightforward.

  1. You can use sh:ignoredProperties in :ShapeA and list :propB there. However, it will then decide that any instance of A is valid if it has :propB, even if it is not an instance of B. And, of course, you would need to do it for all properties that are allowed for instances of subclasses
  2. You can use sh:or in defining :ShapeA to say that instances of A (target class) either conform to :ShapeB or to whatever you currently have defined for :ShapeA. Of course, this would require you to treat all subclasses this way and it becomes complex, especially since you already have the complexity of separating node shapes and classes. It will be more straightforward (but still some maintenance issue as you add new subclasses) if you use implicit class targets https://www.w3.org/TR/shacl/#implicit-targetClass < https://www.w3.org/TR/shacl/#implicit-targetClass>
  3. Possibly, sh:node could be of use. It is an alternative to using target statements, but this is specific to specifying what is valid as a value of a specific property. See https://www.w3.org/TR/shacl/#NodeConstraintComponent < https://www.w3.org/TR/shacl/#NodeConstraintComponent>. With sh:node, it does not matter if a value of a property is a member of multiple classes, it will only apply the identified shape.

May be some other variations.

Certainly. The point is that one cannot have both 1/ sh:closed on shapes that target classes with subclasses and 2/ that "little bit of RDFS inference". So I had some other ideas :

  1. don't use sh:targetClass as it triggers this RDFS inference; instead use a SPARQL Target that will select only direct instances of the classes (but SPARQL target is part of SHACL advance features)
  2. don't provide the ontology layer with subClassOf relationships to the SHACL engine; I did this but for some reason I can't remember this was causing other issues;
  3. don't use sh:closed; instead, create 1 shape per property that will validate the domain of the property (iow, that will validate that each property is asserted on an instance of the correct type), using a combination of sh:targetSubjectOf and sh:class :

http://the.property-convertedToShape a sh:NodeShape ; sh:class http://the.domain.class.of.the.property ; sh:targetSubjectsOf http://the.property .

This is not "closed", but at least it allows to check that every property in my knowledge domain is asserted on the correct class. It does not verify if other properties are asserted as well.

Another thing I find confusing is the (absence of) relationship between sh:closed and the use of property paths in sh:path. sh.closed only works with simple properties asserted in sh:path. AFAIK this is not something explicit in the spec. It does not take into account properties that are part of a property path. So I find myself sometimes expressing similar things twice : once with a simple property to be catched up by sh:closed, once with a property path to express the constraint I need. e.g. I need to verify the following :

  1. class :C can have property :p1 with instances of :D as value
  2. :inverse-p1 is the inverse property of :p1
  3. class :C needs to have at least one value for either :p1 or its inverse
  4. I want closed shapes;

I'd like to write :

ex:MyNodeShape a sh:NodeShape ; sh:targetClass :C ; ex:property [ sh:path [ sh:alternativePath (:p1 [ sh:inversePath crm:inverse-p1 ]) ] ; sh:minCount 1 ; sh:class :D ; ] sh:closed true ;

But sh:closed does not understand that :p1 is "allowed" on :C, because it's hidden in the path. So I need to write :

ex:MyNodeShape a sh:NodeShape ; sh:targetClass :C ; sh:property [ sh:path [ sh:alternativePath (:p1 [ sh:inversePath crm:inverse-p1 ]) ] ; sh:minCount 1 ; ] ; sh:property [ sh:path :p1 ; sh:class :D ; ] ; sh:closed true ;

Best Regards Thomas

Hope this helps,

Irene

On Oct 1, 2020, at 3:55 AM, Jason B. Koh notifications@github.com wrote:

Hi! I'm trying to use SHACL for a couple of my projects. I would like to understand the relationship between rdfs:subClassOf and sh:closed better. Basically, I would like to use the closed world assumption supported by sh:closed but allowing subclasses to specify more than their superclasses' shapes. For example,

The schema graph: :ClassA a rdfs:Class.

:ClassB rdfs:subClassOf :ClassA. The Shape graph: :ShapeA a sh:NodeShape; sh:targetClass :ClassA; sh:property [sh:path :propA; sh:datatype xsd:string]; sh:closed true.

:ShapeB a sh:NodeShape; sh:targetClass :ClassB; sh:property [sh:path :propB; sh:datatype xsd:string]; sh:property [sh:path :propA; sh:datatype xsd:string]; sh:closed true. The Data graph: :instanceA a :ClassA; :propA "valueA".

:instanceB a :ClassB; :propA "valueA"; :propB "valueB". In the RDFS logic, we can infer the following triple

:instanceB a :ClassA. So now, instanceB is violating ShapeA's closed assumption.

I like both 1) RDFS subclass hierarchy across the concepts I would like to model and 2) sh:closed property to easily verify the entire data set. So my desired outcome would be, superclasses's shapes' sh:closed property would be ignored in SHACL validation. I feel it's a natural modeling practice, like, if this is only an instance of ClassA, it can only have propA. If that is an instance of ClassB which is a subclass or an extension of ClassA, it can have both propA and propB.

Would there be a solution for this use case?

Thanks a lot!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/TopQuadrant/shacl/issues/101>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAG762C4G4ICUKM3KCND5ELSIQYXXANCNFSM4SACLMDQ .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TopQuadrant/shacl/issues/101#issuecomment-702208345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2H4JPSXRPN3JBQUZPLF3SISMWVANCNFSM4SACLMDQ .

--

Thomas Francart - SPARNA Web de données | Architecture de l'information | Accès aux connaissances blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart tel : +33 (0)6.71.11.25.97, skype : francartthomas