Open tobiasschweizer opened 2 years ago
I tried the above with https://github.com/TopQuadrant/shacl (CLI) version 1.4.2:
./shaclvalidate.sh -datafile datetime.ttl -shapesfile creativework.ttl 14:49:39 WARN riot :: [line: 2, col: 68] Lexical form '2022-07-08T06:48:22.159262' not valid for datatype XSD date @prefix dash: http://datashapes.org/dash# . @prefix graphql: http://datashapes.org/graphql# . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix schema1: http://schema.org/ . @prefix sh: http://www.w3.org/ns/shacl# . @prefix swa: http://topbraid.org/swa# . @prefix tosh: http://topbraid.org/tosh# . @prefix xsd: http://www.w3.org/2001/XMLSchema# .
[ rdf:type sh:ValidationReport ; sh:conforms false ; sh:result [ rdf:type sh:ValidationResult ; sh:focusNode https://openalex.org/W2738724892 ; sh:resultMessage "Value must be a valid literal of type date e.g. ('YYYY-MM-DD')" ; sh:resultPath schema1:dateCreated ; sh:resultSeverity sh:Violation ; sh:sourceConstraintComponent sh:DatatypeConstraintComponent ; sh:sourceShape [] ; sh:value "2022-07-08T06:48:22.159262"^^xsd:date ] ] .
datetime.ttl
<https://openalex.org/W2738724892> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/CreativeWork>.
<https://openalex.org/W2738724892> <http://schema.org/dateCreated> "2022-07-08T06:48:22.159262"^^<http://www.w3.org/2001/XMLSchema#date>
creativework.ttl
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema1: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://rescs.org/dash/creativework/CreativeWorkShape> a sh:NodeShape ;
rdfs:label "CreativeWork"^^xsd:string ;
rdfs:comment "The most generic kind of creative work, including books, movies, photographs, software programs, etc."^^xsd:string ;
sh:property [ sh:datatype xsd:date ;
sh:description "The date on which the CreativeWork was created or the item was added to a DataFeed." ;
sh:maxCount 1 ;
sh:name "dateCreated" ;
sh:path schema1:dateCreated ] ;
sh:targetClass schema1:CreativeWork .
There are two things I can see in the output:
xsd:date
(parsing)xsd:date
Shouldn't pyshacl
also report an error for this case? Or this this related to rdflib
which should throw a warning for an invalid xsd:date
?
Please let me know if I should provide more information about my use case. Thanks!
Hi @tobiasschweizer
Sorry for the delayed response on this one.
This problem is coming from RDFLib. PySHACL uses the RDFLib library to check whether the Literal's lexical text matches its given datatype.
Note, there was some work done in this area in the lead up to the RDFLib v6.2.0 release, so the new version may have some changes that help with this issue.
Additionally, RDFLib v6.2.0 gives the ability for a Literal to be flagged as "ill-typed", that is, when a Literal's given lexical text does not match its given data type, it is flagged as "ill-typed", and PySHACL can now use this value to help complete the validation checks in the sh:datatype
constraint.
There will be a new version of PySHACL out later today, (pyshacl v0.20.0), that uses RDFLib v6.2.0 by default, and takes advantage of this new "ill-typed" Literals feature, so please try that and let me know if it solves your issue.
Hi @ashleysommer
No worries, I was on a long holiday in August and did not do anything with RDF for a while ;-)
Thanks for the heads-up. I will gladly try the new pyshacl
version and let you know about the outcome.
Sorry, didn't mean to automatically close this
I've just installed pyshacl
0.20.0 and pip
automatically updated rdflib
to "6.2.0".
However, "2022-07-08T06:48:22.159262" is still regarded a valid xsd:date
.
Thanks. I'll forward that up to the RDFLib team, the fix will lie with them now.
Hi @ashleysommer ,
I've recently updated rdflib
to 6.3.1
and I am now getting
in parse_date raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring) isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: ...
So it seems that rdflib
performs some actual checking of dates now which is great :-).
I figured that rdflib
delegates the date literal parsing to isodate
's parse_date
: https://github.com/gweis/isodate/blob/8856fdf0e46c7bca00229faa1aae6b7e8ad6e76c/src/isodate/isodates.py#L118
What I found a bit surprising is that rdflib
automatically adds day precision to dates with year and month precision. This behaviour comes from isodate
:
For incomplete dates, this method chooses the first day for it. For instance if only a century is given, this method returns the 1st of January in year 1 of this century.
So this means that "2016"^^xsd:date
in the original data is going to be a "2016-01-01"^^xsd:date
when being validated.
This behaviour comes from
isodate
So this means that
"2016"^^xsd:date
in the original data is going to be a"2016-01-01"^^xsd:date
when being validated.
Yeah, I've seen this issue come up before (in Python, outside of the RDF world). I think we would see this same issue with whichever datetime library RDFLib uses. This level of detail in RDF spec seems to be very implementation-specific.
Hi there,
I have a question regarding validation of
xsd:date
andxsd:dateTime
. I am usingpyshacl
version 0.19.1.Given the following shapes:
I noticed that
schema:dateCreated
only has to have the correct type annotation and the value has to be a string to be valid.So this also does pass validation although it is not a
xsd:date
but anxsd:dateTime
:Does
pyshacl
actually check if the given value string is a valid date or is this somehow out of scope?Thanks for your feedback!