Closed wagonhelm closed 3 years ago
Ideally questions like this should be asked in stackoverflow, unless it is a bug. In this case you can change the format for serialization by using rdflib.term.bind
as follows:
endpoint_url = "https://query.wikidata.org/sparql"
user_agent = "LINCS-https://lincsproject.ca//%s.%s" % (
sys.version_info[0],
sys.version_info[1],
)
sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
sparql.setReturnFormat(RDFXML)
query = """CONSTRUCT { ?s ?p ?o } WHERE { VALUES ?s { wd:Q184832 } ?s ?p ?o }"""
sparql.setQuery(query)
rdflib.term.bind(
XSD.decimal,
decimal.Decimal,
constructor=decimal.Decimal,
lexicalizer=lambda val: f"{val:f}",
datatype_specific=True,
)
results = sparql.query().convert()
triples = set(
results.triples(
(None, URIRef("http://www.wikidata.org/prop/direct/P2201"), None)
)
)
for triple in triples:
(s, p, o) = triple
logging.info("triple = %s", triple)
logging.info("str(o) = %s", str(o))
logging.info("o.value = %s/%s", type(o.value), o.value)
logging.info("o.n3() = %s", o.n3())
Note bind comes before decode.
Full working example here: https://gitlab.com/aucampia/contrib/rdflib/-/blob/master/tests/test_issues.py#L13
Output:
datatype 'http://www.w3.org/2001/XMLSchema#decimal' was already bound. Rebinding.
triple = (rdflib.term.URIRef('http://www.wikidata.org/entity/Q184832'), rdflib.term.URIRef('http://www.wikidata.org/prop/direct/P2201'), rdflib.term.Literal('0.0000000000000000000000000000001', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#decimal')))
str(o) = 0.0000000000000000000000000000001
o.value = <class 'decimal.Decimal'>/1E-31
o.n3() = "0.0000000000000000000000000000001"^^<http://www.w3.org/2001/XMLSchema#decimal>
Please remember to close the issue
This appears to fix my problem, thankyou @aucampia. Originally I wasn't really sure if it was a bug or issue.
@wagonhelm My appologies, I think this is actually a bug. What you wrote in the description made me think you were just looking for a way to achieve something ("I need them in decimal value. How could I go about doing this?") and I did not actually check if the behaviour is correct.
https://www.w3.org/TR/xmlschema11-2/#decimal
The lexical space of decimal is the set of lexical representations which match the grammar given above, or (equivalently) the regular expression
(\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)
1E-31
does indeed not match that regex so a fix is needed.
Please re-open it.
@aucampia, I re-opened, though I do not thoroughly understand the issue or the library and fear I'm not using the right words. Ultimately my issue was that when I tried to load the resulting turtle using Wikibase / WDQS I would get an error with any line with scientific notation. When reloading the .ttl using rdflib it results in the following:
triple = %s (rdflib.term.URIRef('http://www.wikidata.org/entity/Q184832'), rdflib.term.URIRef('http://www.wikidata.org/prop/direct/P2201'), rdflib.term.Literal('1e-31', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double')))
str(o) = %s 1e-31
o.value = %s/%s <class 'float'> 1e-31
o.n3() = %s "1e-31"^^<http://www.w3.org/2001/XMLSchema#double>
The floatRep production is equivalent to this regular expression (after whitespace is removed from the regular expression):
(\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)([Ee](\+|-)?[0-9]+)?
|(\+|-)?INF|NaN
The ·value space· of double contains the non-zero numbers m × 2e , where m is an integer whose absolute value is less than 253, and e is an integer between −1074 and 971, inclusive.
@wagonhelm
"1e-31"
is completely valid for xsd:double
- but invalid for xsd:decimal
If I run CONSTRUCT { ?s ?p ?o } WHERE { VALUES (?s ?p) { (wd:Q184832 <http://www.wikidata.org/prop/direct/P2201>) } ?s ?p ?o }
against Wikidata it returns this:
curl --silent 'https://query.wikidata.org/sparql' \
--header "Accept: application/n-triples" \
--data-urlencode 'query=CONSTRUCT { ?s ?p ?o } WHERE { VALUES (?s ?p) { (wd:Q184832 <http://www.wikidata.org/prop/direct/P2201>) } ?s ?p ?o }'
<http://www.wikidata.org/entity/Q184832>
<http://www.wikidata.org/prop/direct/P2201>
"0.0000000000000000000000000000001"^^<http://www.w3.org/2001/XMLSchema#decimal> .
For me RDFLib formats that as 1E-31
unless I first do:
rdflib.term.bind(
XSD.decimal,
decimal.Decimal,
constructor=decimal.Decimal,
lexicalizer=lambda val: f"{val:f}",
datatype_specific=True,
)
This is a workaround to a real issue, and I would describe the real issue being worked around here as "Invalid serialization of xsd:decimal to scientific notation".
If you prefer to format xsd:double
to be serialized as decimal instead of scientific notation then that is user preference, and the right solution there would be to use rdflib.term.bind
as follow (note XSD.double
instead of XSD.decimal
, also I did not test this so it may be wrong):
rdflib.term.bind(
XSD.double,
float,
constructor=float,
lexicalizer=lambda val: f"{val:f}",
datatype_specific=True,
)
I cannot guarantee this would be compliant with XSD though, but I have no specific reason to doubt it would be non-compliant. It's just user beware.
Just noticed a mistake in my last comment and corrected it. I will make a fix for it once #1315 is merged.
I'm having an issue when doing a construct query using SPARQL is returning a rdflib graph with small numbers casted as a literal in scientific notation.
and it's output
I believe o.value should be class 'float'
If you serialize it this
It outputs: