Closed volkerjaenisch closed 3 months ago
Dear SEMICeu!
I tried the DCAT-ap shapes in two of other SHACL processors:
pySHACL
RDF4J and graphDB. graphDB is more or less a Wrapper around RFD4J. I use graphDB onyl to be sure that my RDF4J installation is not flawed.
pySHACL imports the shapes quite well and did a validation quite comparable to that of the ISAITB SHACL validator.
Validation Report
Conforms: False
Results (20):
Constraint Violation in NodeConstraintComponent (http://www.w3.org/ns/shacl#NodeConstraintComponent):
Severity: sh:Violation
Source Shape: :Dataset_Property_dct_issued
Focus Node: <https://geobasis-bb.de#dcat_Dataset_568978c5-fa73-48d1-a6f9-487aabdc1aef>
Value Node: Literal("2022-11-17T09:37:25.626872" = None, datatype=xsd:dateTimeStamp)
Result Path: dct:issued
Message: Value does not conform to Shape :DateOrDateTimeDataType_Shape
Process finished with exit code 1
RDF4J throws an Exception. and graphDB the same. No clue what this means. I will open a Bugreport at RDF4J.
javax.servlet.ServletException: org.eclipse.rdf4j.repository.RepositoryException: Shape with multiple types: <http://www.w3.org/ns/shacl#PropertyShape>, <http://www.w3.org/ns/shacl#NodeShape>
org.eclipse.rdf4j.workbench.proxy.WorkbenchServlet.handleRequest(WorkbenchServlet.java:160)
org.eclipse.rdf4j.workbench.proxy.WorkbenchServlet.service(WorkbenchServlet.java:112)
org.eclipse.rdf4j.workbench.proxy.WorkbenchGateway.service(WorkbenchGateway.java:117)
org.eclipse.rdf4j.workbench.base.AbstractServlet.service(AbstractServlet.java:129)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
org.eclipse.rdf4j.workbench.proxy.CacheFilter.doFilter(CacheFilter.java:64)
org.eclipse.rdf4j.workbench.proxy.CookieCacheControlFilter.doFilter(CookieCacheControlFilter.java:56)
Root Cause
org.eclipse.rdf4j.repository.RepositoryException: Shape with multiple types: <http://www.w3.org/ns/shacl#PropertyShape>, <http://www.w3.org/ns/shacl#NodeShape>
org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1095)
org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeNoContent(SPARQLProtocolSession.java:1049)
org.eclipse.rdf4j.http.client.RDF4JProtocolSession.upload(RDF4JProtocolSession.java:1103)
org.eclipse.rdf4j.http.client.RDF4JProtocolSession.upload(RDF4JProtocolSession.java:928)
org.eclipse.rdf4j.http.client.RDF4JProtocolSession.upload(RDF4JProtocolSession.java:919)
org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.add(HTTPRepositoryConnection.java:447)
org.eclipse.rdf4j.workbench.commands.AddServlet.add(AddServlet.java:94)
org.eclipse.rdf4j.workbench.commands.AddServlet.doPost(AddServlet.java:53)
org.eclipse.rdf4j.workbench.base.TransformationServlet.service(TransformationServlet.java:98)
org.eclipse.rdf4j.workbench.base.AbstractServlet.service(AbstractServlet.java:129)
org.eclipse.rdf4j.workbench.proxy.ProxyRepositoryServlet.service(ProxyRepositoryServlet.java:100)
org.eclipse.rdf4j.workbench.proxy.WorkbenchServlet.service(WorkbenchServlet.java:215)
org.eclipse.rdf4j.workbench.proxy.WorkbenchServlet.handleRequest(WorkbenchServlet.java:137)
org.eclipse.rdf4j.workbench.proxy.WorkbenchServlet.service(WorkbenchServlet.java:112)
org.eclipse.rdf4j.workbench.proxy.WorkbenchGateway.service(WorkbenchGateway.java:117)
org.eclipse.rdf4j.workbench.base.AbstractServlet.service(AbstractServlet.java:129)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
org.eclipse.rdf4j.workbench.proxy.CacheFilter.doFilter(CacheFilter.java:64)
org.eclipse.rdf4j.workbench.proxy.CookieCacheControlFilter.doFilter(CookieCacheControlFilter.java:56)
This outcome is good as well bad. Good since the problem is not bound to a certain SHACL processor.
Bad because we do not know why the inheritance is neglected.
Cheers, Volker
@volkerjaenisch, your example is another variant of the topic "inference" based validation. The issue with this is that there is somewhere outside the rules information that would allow to infer information that would satisfy the constraint.
A classical example of this are Agents. The DCAT-AP rules state that a publisher must be an Agent. An Organisation is a subclass of an Agent. Therefore a publisher p1 only having the class denotation being an Organisation would suffice to derive that p1 is an Agent. And therefore the validation rule will be satisfied if there is an inference happening that from p1 is an Organisation, p1 is also an Agent.
But then the question arises: should we include this knowledge or not? And who should supply it. E.g. Suppose I have a German classification of Agents in the form of subclasses, then why would that classification not be acceptable if one has the knowledge about the German classification?
Similarly here, this is the variant with xsd types. The validation rules indeed do not explicitly include xsd:dateTimeStamp, but only xsd:dateTime. According to the definitions in https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp it is a subclass of xsd:dateTime.
SHACL provides the mean to take into account rdfs:subClassOf, but does not provide a mean to include subclass relationships that are in the literal "types".
So the only approach is to create a "full hierarchy" into the SHACL expression. While 'technically' there is no real harm in adding another case, it is not a future proof solution.
To the community, the following questions:
Dear @bertvannuffelen !
Thank you for the detailed analysis. I agree with you, completely. I am quite interested in the answers from the community.
Nearly all our DCAT-AP.de datasets (harvested from ISO19115 data) have xsd:dateTimeStamp (dct:modified, dct:created) as their type. I see some possible ways to deal with that: 1) We change the type. This is only coping since we have several other data providers where we simply harvest their RDF. Parsing all this RDF and changing the type (which is IMHO no error) is a lot 'money for nothing'. 2) SHACL learns (implements) inheritance (inference), see also 4) 3) xsd:dateTimeStamp is included in the DCAT-AP shape 4) Some RDF/OWL snipped added to the data that can be used by SHACL to accept xsd:dateTimeStamp.
IMHO 2) is the best solution. 3) is a straight forward fix, which solves the xsd:dateTimeStamp issue and can be removed if 2) may comes up.
Solution 4) may be the most flexible way since it enables the data provider to inform the validating instance of additional knowledge for the validation.
I tried 4) to no avail. I added to the data
xsd:dateTimeStamp rdfs:subClassOf xsd:dateTime
bur no SHACL processor pySHACL/ITB does use this information, even with explicitly forced inference (pySHACL). I assume this is what you meant by
SHACL provides the mean to take into account rdfs:subClassOf, but does not provide a mean to include subclass relationships that are in the literal "types".
Due to xsd:dateTimeStamp being a primitive XML type and no RDF entity. Maybe someone wiser than myself may shed a bit light here or may even propose a working solution.
On the other hand 4) may water down the strictness and consistence of a centralized SHACL validation. This could lead to hacks to make crappy data pass the validation.
Cheers, Volker
Dear SEMICeu!
We are currently integrating the ISAITB SHACL validator into our portal. While testing we stumbled over
The error message is in German since we use some additions from GovData. So I checked the online version https://www.itb.ec.europa.eu/shacl/dcat-ap/upload showing the same problem:
The processing shape is from dcat-ap_2.1.1_shacl_shapes.ttl :
xsd:dateTimeStamp inherits from xsd:dateTime (http://www.datypic.com/sc/xsd11/t-xsd_dateTimeStamp.html). And therefore the shape looks correct to me.
I guess there is an OWL file missing which defines the inheritance via rdfs:subClassOf.
We have roughly 10000 DCAT-ap.de files with xsd:dateTimeStamp from our ISO19115 harvest. So any help is appreciated.
Cheers, Volker
For reference the dataset