TopQuadrant / shacl

SHACL API in Java based on Apache Jena
Apache License 2.0
217 stars 61 forks source link

SHACL focus node & source Shape #88

Open alex-randles opened 4 years ago

alex-randles commented 4 years ago

Hi,

I've been using SHACL for a couple of my projects, however, I noticed that the source shape and focus node included, don't give the correct information all the time.

For example, I have the following data graph.

rr:predicateObjectMap  [ 
rr:objectMap  [ rr:column    "age" ;
rr:datatype  xsd:string
 ] ;
rr:predicate  dbo:club; 
rr:termType "djhdhdhd"; 
] ;

And the following shapes graph.

ex:PredicateShape
a sh:NodeShape ;
    sh:targetObjectsOf rr:predicateMap, rr:predicateObjectMap ;
    sh:targetClass rr:PredicateMap, rr:PredicateObjectMap ;
    sh:property [
                    sh:message "wrong term type" ;
                    sh:path rr:termType ;
                    sh:nodeKind sh:IRI; 
                ] ;.

The following validation result will be generated.

[ a       <http://www.w3.org/ns/shacl#ValidationReport> ;
  <http://www.w3.org/ns/shacl#conforms>
          false ;
  <http://www.w3.org/ns/shacl#result>
          [ a       <http://www.w3.org/ns/shacl#ValidationResult> ;
            <http://www.w3.org/ns/shacl#focusNode>
                    [] ;
            <http://www.w3.org/ns/shacl#resultMessage>
                    "wrong term type"  ;
            <http://www.w3.org/ns/shacl#resultPath>
                    rr:predicate ;
            <http://www.w3.org/ns/shacl#resultSeverity>
                    <http://www.w3.org/ns/shacl#Violation> ;
            <http://www.w3.org/ns/shacl#sourceConstraintComponent>
                    <http://www.w3.org/ns/shacl#NodeKindConstraintComponent> ;
            <http://www.w3.org/ns/shacl#sourceShape>
                    []  ;
            <http://www.w3.org/ns/shacl#value>
                    "ddhdhdddgdgggggggggggggggggggggggggggggggggggggggggggggg"
          ] ;

I am curious to know why, the focusNode and sourceShape are blank?

Many thanks.

HolgerKnublauch commented 4 years ago

The validation results don't seem to align with the shapes graph: there is no sh:nodeKind constraint.

That aside, if the (property) shape that caused the violation is a blank node then it will be (the same) blank node in the validation results graph. Likewise the focus node. When you execute them from Java the nodes will be .equals() and you can look up the property shape in the shapes graph using the same Jena node object.

alex-randles commented 4 years ago

Thanks for your reply, I added the wrong excerpt of the shapes, I have updated this now.

Would I be able to query the data graph, using the focus node from the validation report. to find the node which is responsibly for causing this violation? I would like to create a SPARQL query to update the value within the data graph, thus removing the violation.

HolgerKnublauch commented 4 years ago

Yes, if you have the focus node F you could query dataGraph.find(F, Node.ANY, Node.ANY) in Jena to find the properties of that focus node. And likewise you can query for the property shapes in the shapes graph based on sourceShape.

If this has answered your questions, please close this ticket.

alex-randles commented 4 years ago

Thanks for you for you reply. I have been trying to query the graph as stated and it won't successful return either.

I have tried the following query, the shapes graph is stored in shapes.ttl and validation report in output.ttl.

PREFIX sh:  <http://www.w3.org/ns/shacl#> 
PREFIX rr: <http://www.w3.org/ns/r2rml#> 
SELECT *
FROM NAMED <shapes.ttl> 
FROM NAMED <output.ttl> 
WHERE { 
GRAPH <output.ttl> { ?s sh:sourceShape ?sourceShape.} 
GRAPH  <shapes.ttl> { ?sourceShape ?p ?o } 
}

If you could help me resolve this issue, I would be extremely grateful, I have been trying everything!.

cygri commented 4 years ago

It look like you're writing the validation report to a file output.ttl, and then load it again to run the SPARQL query. This won't work because blank node connections between graphs don't survive writing to file and reloading. It will work if you do the validation and querying from the same Java program, without writing to an intermediary file.

(The other option is to avoid blank nodes, that is, use IRIs instead of blank nodes for the SHACL property shapes and R2RML maps. This may not be possible for your use case.)

alex-randles commented 4 years ago

Interesting I will try this, how would I change the SHACL property shape shown above to use an IRI instead?

Thanks for your help.

cygri commented 4 years ago

You could do something like:

ex:PredicateShape
a sh:NodeShape ;
    sh:targetObjectsOf rr:predicateMap, rr:predicateObjectMap ;
    sh:targetClass rr:PredicateMap, rr:PredicateObjectMap ;
    sh:property ex:PredicateShape-termType ;
    .
ex:PredicateShape-termType
    sh:message "wrong term type" ;
    sh:path rr:termType ;
    sh:nodeKind sh:IRI; 
    .

That's standard RDF / Turtle.

ensaremirerol commented 10 months ago

Apologies for revisiting this issue after three years, but I was wondering if there have been any improvements regarding this problem. I am currently using topbraid/shacl Java binary from Python for graph validation, and it would be great to include sourceShapes(especially the blank nodes) in the final report as well. For reference, GraphDB already does that. If there is a blank node in the report, GraphDB appends these blank nodes at the end of the report.

Thanks!

HolgerKnublauch commented 10 months ago

No this hasn't been done. The main reason is that we have never needed this functionality as we always call the engine from Java code where we can simply walk the result node by identity without having to go through serialization.

Maybe you can reformulate the shapes so that they start at the URI nodes and then walk into the properties using sh:node or sh:qualifiedValueShape. Then the focus node would be the URI.

ensaremirerol commented 10 months ago

In our use case reformulating shapes is quite hassle. Instead I managed to add blank nodes at the end of the report by modifying the source. Simply added flag named -addShapes(not the best name) and a class that finds blank nodes in the generated report. If you are interested I can create a PR for review.

HolgerKnublauch commented 10 months ago

Yes, any PR is welcome, if it helps. Maybe call it -addBlankNodes instead of -addShapes?