TopQuadrant / shacl

SHACL API in Java based on Apache Jena
Apache License 2.0
217 stars 61 forks source link

validating custom datatypes in SHACL #161

Closed onderwijsarchitectuur closed 9 months ago

onderwijsarchitectuur commented 10 months ago

Hoi,

SHACL works with build-in datatypes, the same als XSD and SPARQL. I found references that SHACL also works with custom datatypes defined as rdfs:Datatype. This is useful when the datatype is used in more places. One way to define such a thing is:

schema:Postcode a rdfs:Datatype; owl:onDatatype xsd:string; owl:withRestrictions ( [xsd:maxLength 6] [xsd:minLength 6] ) . Testing with https://shacl-play.sparna.fr/play/ I find that the SHACL-API ignores the facets in those datatypes. How can I make sure that SHACL validates custom datatypes as well?

Thanks Gerald

HolgerKnublauch commented 10 months ago

In SHACL I would suggest to write this as

ex:MyClass-myProperty
    a sh:PropertyShape ;
    sh:path ex:myProperty ;
    sh:node schema:Postcode .

schema:Postcode
    a sh:NodeShape ;
    sh:datatype xsd:string ;
    sh:minLength 6 ;
    sh:maxLength 6 .        

I would generally avoid using custom datatypes, because most tools don't understand them. Even in SPARQL you cannot do something like

BIND (STRLEN("12345"^^schema:Postcode) AS ?length)

and you cannot do mathematical functions for typed derived from numeric types.

HolgerKnublauch commented 10 months ago

If despite this, you really really want to continue using custom datatypes, you could write a SPARQL-based constraint that examines the xsd:minLength and xsd:maxLength and walks through all literals that may use that datatype.

tfrancart commented 10 months ago

If despite this, you really really want to continue using custom datatypes, you could write a SPARQL-based constraint that examines the xsd:minLength and xsd:maxLength and walks through all literals that may use that datatype.

A generalization of this approach would be to associate a regex to the rdfs:Datatype, and then writes a SPARQL-based constraint that test whether literals that use this datatype conforms to the regex. Seems like a generic-enough requirement to be included in DASH ?

HolgerKnublauch commented 10 months ago

If the RDF model already contains xsd:minLength etc then I don't see why going through a regex would be beneficial. We need to query what's in the model.

And the problem is to detect all literals of a given datatype. This requires iterating over all triples in the graph! That's why I would model this explicitly, e.g. using sh:node.

onderwijsarchitectuur commented 10 months ago

Thanks for the info

HolgerKnublauch commented 9 months ago

I assume this can be closed...