eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
357 stars 162 forks source link

Expose shapes used in SHACL validation #1507

Closed kiramclean closed 4 years ago

kiramclean commented 5 years ago

This would allow developers to construct useful error messages when a validation fails.

For example, validating this person:

@prefix ex: <https://example.com/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

ex:pete a ex:Person ;
    ex:age "18" .

with this shape:

@prefix ex: <https://example.com/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:PersonShape
    a sh:NodeShape  ;
    sh:targetClass ex:Person ;
    sh:property [
            sh:path ex:age ;
            sh:datatype xsd:integer ;
    ] .

Gives this output (obtained by writing the resulting report as turtle, as described in the example here):

@prefix sh: <http://www.w3.org/ns/shacl#> .

_:node1dinj2s67x15066 a sh:ValidationReport;
  sh:conforms false;
  sh:result _:node1dinj2s67x15067 .

_:node1dinj2s67x15067 a sh:ValidationResult;
  sh:focusNode <https://example.com/ns#pete>;
  sh:resultPath <https://example.com/ns#age>;
  sh:sourceConstraintComponent sh:DatatypeConstraintComponent;
  sh:sourceShape _:node1dinj2s67x14979 .

With only this data, it's tricky to show a useful error message, because the sh:sourceShape of the validation result points to an orphaned node. The anonymous node _:node1dinj2s67x14979 is not the subject of anything that is easily accessible (which would allow us to drill down and figure out what kind of datatype we were expecting, for example -- better than knowing that we simply got the wrong datatype).

I'm not familiar with the rdf4j codebase, but It appears as though the shapes used for validation are used (as nodeShapes) in prepare() here, but then are not passed along to the new ShaclSailValidationException on line 391, and don't end up in the final result.

So, first of all -- is my understanding correct, and there is currently no way to find the predicates and objects that should have the sh:sourceShape of the validation result as their subject (from the validation report alone)? And if so, does it seem reasonable to build a way to pass those along so that consumers can access them (for the purposes of e.g. building human-readable error messages)?

Thanks for a great library. We're excited to be able to use SHACL for validation and hoping to integrate more of it to show useful, human-readable error messages.

hmottestad commented 5 years ago

Hi @kiramclean

Thanks for using the ShaclSail :) I'm the lead developer and have been working on the ShaclSail for the past two years or so.

For this particular case I think the simplest is to retrieve the shape from the shapes graph in the ShaclSail:

.....
} catch (RepositoryException e) {

    ShaclSailValidationException cause = (ShaclSailValidationException) e.getCause();
    Model actual = cause.validationReportAsModel();

    Model filter = actual.filter(null, SHACL.SOURCE_SHAPE, null);

    filter.forEach(s -> {
        Value object = s.getObject();

        try (SailRepositoryConnection connection = shaclSail.getConnection()) {

            try (Stream<Statement> stream = Iterations.stream(connection.getStatements((Resource) object, null, null, RDF4J.SHACL_SHAPE_GRAPH))) {
                List<Statement> collect = stream.collect(Collectors.toList());

                // collect contains the shape!
            }

        }

    });
}

Due to how everything is implemented you need to explicitly use the SHACL_SHAPE_GRAPH as your context. This graph/context is not available through SPARQL!

Another approach is to not use blank nodes. (This is how I usually do it!)

@prefix ex: <https://example.com/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:PersonShape
    a sh:NodeShape  ;
    sh:targetClass ex:Person ;
    sh:property ex:PersonShapeAgeInteger  .

ex:PersonShapeAgeInteger 
    sh:path ex:age ;
    sh:datatype xsd:integer .

As a side note, the SHACL Validation Report support in the ShaclSail is a bit limited (no sh:value, sh:message or sh:resultSeverity is supported yet). The SHACL Validation Report format is also quite limited, I have missed an "actual" and "expected" which would be useful for things like datatype and maxCount. I brought this to the attention of the SHACL working group probably 3-4 years ago.

kiramclean commented 5 years ago

This is very helpful! Thank you for the prompt and useful reply. I haven't tried this yet, but it seems to be exactly what I was wondering about. The second approach (defining actual property nodes and using those instead) is one path we started down, but in some cases it can make the shape definitions more verbose than seems ideal.

Anyway, I may be able to give this a shot sometime soon. Thanks again for all the time and effort put in, and for understanding and answering my question! Cheers.

hmottestad commented 5 years ago

As it's likely that others will have use for this info I reckon it should be included in the example code in the documentation.

kiramclean commented 4 years ago

Just wanted to send a thank you -- I got around to doing this and it works great. A note for anyone else looking to do the same, though: We learned that this is only possible after v3.0.0. Once we upgraded it started working as expected.