Closed tobiasschweizer closed 2 years ago
For example, -1E4, 1267.43233E12, 12.78e-2, 12 , -0, 0 and INF are all legal literals for float.
https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#float
My assumption is that 100000
is implicitly 100000.0
when typed as xsd:float
.
Could it be that 100000
is actually represented as an int
in Python?
Hi @tobiasschweizer Thanks for the bug report. I think this is a bug in the RDFLib JSON-LD parser. Is it possible for you to test the same example but encoded in Turtle format, to see if the issue remains?
Sure, I will try this and come back to you asap.
I tried the following which worked fine:
"monetaryamount.ttl"
<http://www.example.com/1> a <http://schema.org/MonetaryAmount> ;
<http://schema.org/value> "100000"^^<http://www.w3.org/2001/XMLSchema#float> .
pyshacl -sf json-ld -s shapes.json -df turtle monetaryamount.ttl
Validation Report Conforms: True
Ok, great. Thanks, that confirms the bug lies in the JSON-LD parser. I'll create a corresponding bug in the RDFlib bug tracker.
Ok, thanks. Let me know if I can be of further assistance to substantiate the report.
Hi @tobiasschweizer I finally got a chance to do some testing on this. A simple test:
my_json = """
{
"@context": {
"@vocab": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@type": "MonetaryAmount",
"value": {
"@type": "xsd:float",
"@value": 100000
}
}
"""
g = rdflib.Graph()
g.parse(data=my_json, format="json-ld")
g.print()
This prints
@prefix : <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[] a :MonetaryAmount ;
:value "100000"^^xsd:float .
So it appears there is no bug in the JSON-LD parser, it parses the amount to a float, and when serializing back into turtle, it remains a float. So the issue must lie elsewhere. I'll look into it further.
Ok, Ive worked out one key difference between the json-ld example and the turtle example.
Even though the datatype of both is xsd:float, the "lexical value" of the data in the Turtle value is a string ("1000"
), and the lexical of the json-ld version is an integer (1000
).
When setting up a Literal value, RDFLib has the ability to parse a lexical string into a real value matching the datatype, but only when the lexical value is a string.
This in the past has never been an issue because in Turtle and other RDF data formats, the lexical value of any typed value is always a string. But in JSON-LD, it can clearly be something other than a string.
As a simple example, replace @value
string in your json-ld:
{
"@context": {
"@vocab": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@type": "MonetaryAmount",
"value": {
"@type": "xsd:float",
"@value": "100000"
}
}
You will see your example now passes as expected.
So now I understand that this issue lies somewhere in between the json-ld parser, and RDFLib's handling of Literal lexical values. It could possibly be fixed by adding an extra translation step in the json-ld parser, or it could be fixed by adding an extra conversion of non-string lexicals in RDFLib, or it might be easier to fix it at the PySHACL level, and modify how the datatype constraint works, allowing more kinds of values for xsd:float and xsd:double.
Hi @ashleysommer ,
Apologies for butting in, but I saw a notice for this fly by and remembered an issue with default datatypes I'd encountered a while ago. Some data from my community was getting flagged after we had a "All non-integer numbers are now xsd:decimal
" decision. The standards-section citations are in this commit:
https://github.com/casework/CASE-Examples/commit/af9d622ec5e693ce0a19627199baaaef0bbc5f27
Thanks @ajnelson-nist
Thats great to see. Personally I too always try to use xsd:decimal
wherever possible rather than xsd:float
or xsd:double
. Floats and Doubles are plagued by implementation issues, they are treated differently in different programming languages, and it easy to run into the issue we see in this thread. Eg, should the lexical of 100000
be converted to float? A float in Python is actually really a double. So given the datatype is xsd:float
, should it still fail validation? Should it really be xsd:double
?
I believe the current way that RDFLib handles it is probably fine. After all, there's nothing stopping you from writing:
"cat"^^xsd:float
And rdflib will happily accept that as a real Literal value, because that's what you've specified, and the value will still be "cat", and the datatype will still be xsd:float
. But it would fail the PySHACL datatype constraint of xsd:float
.
Similarly, as per the issue described above, the lexical is an int, but the datatype is a xsd:float
, RDFLib doesn't care, the value is still an int, and the datatype is still xsd:float
, but as we see, it does fail the datatype constraint.
Given that xsd:decimal
will always have a lexical form of a string (because there are some decimals that cannot be represented as an int, float, or double) and RDFlib will parse it to a python Decimal when loaded, then adopting this practice will solve the class of issues seen here.
Ok, Ive worked out one key difference between the json-ld example and the turtle example.
Even though the datatype of both is xsd:float, the "lexical value" of the data in the Turtle value is a string (
"1000"
), and the lexical of the json-ld version is an integer (1000
).When setting up a Literal value, RDFLib has the ability to parse a lexical string into a real value matching the datatype, but only when the lexical value is a string.
This in the past has never been an issue because in Turtle and other RDF data formats, the lexical value of any typed value is always a string. But in JSON-LD, it can clearly be something other than a string.
As a simple example, replace
@value
string in your json-ld:{ "@context": { "@vocab": "http://schema.org/", "xsd": "http://www.w3.org/2001/XMLSchema#" }, "@type": "MonetaryAmount", "value": { "@type": "xsd:float", "@value": "100000" } }
You will see your example now passes as expected.
So now I understand that this issue lies somewhere in between the json-ld parser, and RDFLib's handling of Literal lexical values. It could possibly be fixed by adding an extra translation step in the json-ld parser, or it could be fixed by adding an extra conversion of non-string lexicals in RDFLib, or it might be easier to fix it at the PySHACL level, and modify how the datatype constraint works, allowing more kinds of values for xsd:float and xsd:double.
Thanks @ashleysommer for looking into this.
So if I understand correctly, instead of "@value": 100000
we could simply write "@value": "100000"
to sidestep the problem.
So maybe the source of the problem lies in the isinstance
check as mentioned above? 100000 is represented as an int
in Python which is not an instance of float
.
Maybe the relations of numeric types need be taken into account here. I am no Python expert but I remember in Java you could assign an int
to a variable of type double
but not the opposite. So wouldn't the solution be to accept both int
and float
when doing the check for xsd:float
?
@ajnelson-nist this is somehow off-topic but aren't you working on https://github.com/lambdamusic/Ontospy/pull/107? :-)
So if I understand correctly, instead of "@value": 100000 we could simply write "@value": "100000" to sidestep the problem.
Thats right. If it is possible to do that in your datafiles, that is the easiest way forward.
It works because when RDFLib processes a new Literal object, it has special rules for if the lexical value is a string. When it is a string, but the literal has a known XSD datatype attached, then RDFLib will attempt to parse the string into that format. So the value of the literal will be 100000
as a python float. On the other hand, when the lexical is an int, then RDFLib doesn't know it can convert it, so it keeps the value as an int.
Hi there,
Validating an
xsd:float
gives me an unexpected validation report. I am using "PySHACL Version: 0.19.0".Example:
shapes graph "shapes.json":
data sample "monetaryamount.json":
pyshacl -sf json-ld -s shapes.json -df json-ld monetaryamount.json
gives me:Changing the
@value
to100000.0
or"100000"
makes it pass. However, I think all three variants should be valid, no?I tried the example above on https://shacl.org/playground/ which worked fine.
Could you tell me whether I am doing something wrong or this is a bug?
Thanks a lot!