Open jdogburck opened 2 years ago
based on issue #237, I made a minor change to get the expected output. In term.py, if we simply change -
if self.datatype == _XSD_DOUBLE:
return sub("\\.?0*e", "e", "%e" % float(self))
to
if self.datatype == _XSD_DOUBLE:
return str(float(self.value))#sub("\\.?0*e", "e", "%e" % float(self))
will this be sufficient?
based on issue #237, I made a minor change to get the expected output ... will this be sufficient?
Yes, if you primarily care about precision but be aware that the edit causes a change in datatype from XSD.double
to XSD.decimal
. As mentioned by @jdogburck, this known limitation was discussed by gromgull and the use of XSD.decimal recommended:
In general, you cannot expect the precision of doubles or floats to be kept - since floating point representation simply isn't precise ... A related gotcha with floating point is that number of significant digits are not kept ... if you care about either of these, use XSD.decimals, which are mapped to python objects using the decimal module
The limit of precision is mentioned in (aka “buried in”) the docstring for Literal._literal_n3()
:
# Only limited precision available for floats:
>>> Literal(0.123456789)._literal_n3(use_plain=True)
u'1.234568e-01'
and is inherited from Python string formatting of e
: “With no precision given, uses a precision of 6 digits after the decimal point for float”
So, yes, the limiting LoC is:
https://github.com/RDFLib/rdflib/blob/24d607015d0d7a759785afcce54bd66c94be8f94/rdflib/term.py#L1376
and I had a play around with setting the precision:
diff --git a/rdflib/term.py b/rdflib/term.py
index c0ea4a28..01d788cc 100644
--- a/rdflib/term.py
+++ b/rdflib/term.py
@@ -1357,6 +1357,16 @@ class Literal(Identifier):
u'"1"^^xsd:integer'
"""
+
+ # https://stackoverflow.com/a/6098154
+ # used to determine number of precise digits in a string
+ def __get_precision(str_value):
+ vals = str_value.split('.')
+ if (vals[0] == '0'):
+ return len(vals[1])
+ else:
+ return len(str_value) - 1
+
if use_plain and self.datatype in _PLAIN_LITERAL_TYPES:
if self.value is not None:
# If self is inf or NaN, we need a datatype
@@ -1373,7 +1383,11 @@ class Literal(Identifier):
# in py >=2.6 the string.format function makes this easier
# we try to produce "pretty" output
if self.datatype == _XSD_DOUBLE:
- return sub("\\.?0*e", "e", "%e" % float(self))
+ if str(self).split(".")[1] > "0":
+ rval = '{:.{}e}'.format(float(self), __get_precision(self) - 1)
+ else:
+ rval = sub("\\.?0*e", "e", "%e" % float(self))
+ return rval
elif self.datatype == _XSD_DECIMAL:
s = "%s" % self
if "." not in s and "e" not in s and "E" not in s:
using this test harness:
def test_literal_n3():
assert Literal(1)._literal_n3(use_plain=True) == '1'
assert Literal(1.0)._literal_n3(use_plain=True) == '1e+00'
assert Literal(1.0, datatype=XSD.decimal)._literal_n3(use_plain=True) == '1.0'
assert Literal(1.0, datatype=XSD.float)._literal_n3(use_plain=True) == '"1.0"^^<http://www.w3.org/2001/XMLSchema#float>'
assert Literal("foo", datatype=XSD.string)._literal_n3(use_plain=True) == '"foo"^^<http://www.w3.org/2001/XMLSchema#string>'
assert Literal(True)._literal_n3(use_plain=True) == 'true'
assert Literal(False)._literal_n3(use_plain=True) == 'false'
assert Literal(1.91)._literal_n3(use_plain=True) == '1.91e+00'
# Only limited precision available for floats:
Literal(0.123456789)._literal_n3(use_plain=True) == u'1.234568e-01'
Literal('0.123456789', datatype=XSD.decimal)._literal_n3(use_plain=True) == '0.123456789'
@pytest.mark.parametrize(
"val",
[
# From http://www.datypic.com/sc/xsd/t-xsd_double.html
# xsd str(float(xsd)) literal_n3
("-3E2", "-300.0", '-3e+02'),
("4268.22752E11", "426822752000000.0", '4.268228e+14'),
# ("+24.3e-3", "0.0243", '2.43e-02'), # before
("+24.3e-3", "0.0243", '2.430e-02'),
# ("0.123456789", "0.123456789", '1.234568e-01'), # before
("0.123456789", "0.123456789", '1.23456789e-01'),
("12", "12.0", '1.2e+01'),
("+3.5", "3.5", '3.5e+00'), # any value valid for decimal is also valid for xsd:double
("-INF", "-inf", '"-INF"^^<http://www.w3.org/2001/XMLSchema#double>'), # negative infinity
("-0", "-0.0", '-0e+00'), # 0
("NaN", "nan", '"NaN"^^<http://www.w3.org/2001/XMLSchema#double>'), # Not a Number
# Invalid values
# -3E2.4 - the exponent must be an integer
# 12E - an exponent must be specified if "E" is present
# NAN - values are case-sensitive, must be capitalized correctly
# - an empty value is not valid, unless xsi:nil is used
],
)
def test_xsd_double(val):
assert Literal(float(val[1]), datatype=XSD.double
)._literal_n3(use_plain=True) == val[2]
which seems to be the right area of the city in which the ballpark is located, apart from that annoying trailing zero in '2.430e-02'
. And for full complicance, it probably should be '{:.{}E}'
, not '{:.{}e}'
@gjhiggins, Can I take up this issue?
all, sorry for the long hiatus - vacation...
that said - I agree with @gjhiggins - particularly the discussion of lack of specificity WRT precision and the need to preserve types and not convert from double to literal. for strongly typed literals, should should have similar tests for XSD.decimal and XSD.integer too?
Regarding keeping maximum precision, I was puttering around with something similar to the following for precision but it doesn't handle non-numeric cases like NaN and INF. I'm sure @gjhiggins handles other cases this simple snip misses so probably best to ignore it.
tmp=str(float(self))
return (tmp if 'e' in tmp else f'{tmp}e+00'
All that said, at the heart of my issue is round tripping graphs and reading in expected values which is probably captured here but it would be good to have tests for graph ingest and round tripping of pesky types https://www.w3.org/TR/turtle/#abbrev. maybe with tests similar to:
@pytest.mark.parametrize(
"val",
[
# turtle value, double literal value
# Simple cases from https://www.w3.org/TR/turtle/#abbrev
("4.2E9", "4.2E9"), # abbreviated
('"4.2e9"^^<http://www.w3.org/2001/XMLSchema#double>', "4.2e9"),
# Expand with more values form http://www.datypic.com/sc/xsd/t-xsd_double.html
("+24.3e-3", "0.0243"),
("1.23456789e-01", "0.123456789"),
("12e-00", "12.0"),
# ...
],
)
# Verify we get the Literal we expect from known Turtle literals
def test_read_double(val):
turtle = f'<http://test/s> <http://test/p> {val[0]}.'
g_in = Graph()
g_in.parse(data=turtle, format='turtle')
_, _, input = list(g_in)[0]
# 1 will do if there are tests for these various forms elsewhere
assert (input == Literal(float(val[1]), datatype=XSD.double))
assert (input == Literal(float(val[1])))
assert (input == Literal(val[1], datatype=XSD.double))
# similarly test_read_decimal(val)
# similarly test_read_integer(val)
@pytest.mark.parametrize(
"val",
[
# double literal value
# Simple cases from https://www.w3.org/TR/turtle/#abbrev
"4.2E9", # abbreviated
# Expand with more values form http://www.datypic.com/sc/xsd/t-xsd_double.html
"24.3e-3"
"1.23456789e-01"
"12e-00"
# ...
],
)
# Verify we can round trip literal values through Turtle serialization / parsing
def test_round_trip_double(val):
value = Literal(float(val[0]), datatype=XSD.double)
g_out = Graph()
g_out.add((URIRef('http://test/s'), URIRef('http://test/p'), value))
turtle = g_out.serialize(format='turtle', encoding='utf-8')
g_in = Graph()
g_in.parse(data=turtle, format='turtle')
_, _, input = list(g_in)[0] # cheesy way to grab the literal
assert (input == value)
# similarly test_round_trip_decimal(val)
# similarly test_round_trip_integer(val)
thanks for the discussion...
Serializing objects typed as XSD.double to Turtle (and it's derived Trig) format lose precision after the 6th decimal point in the serialized exponential format. This seems to be a rehash of issue #237 closed in 2013 although the horizon has moved a bit.
I think to it's from a "%e" format in Literal._literal_n3() as used by the TurtleSerializer which forces use_plain=True. The issue isn't present for untyped or Decimal types.
The use case doesn't lend itself to changing the representation of ingested input graphs except, possibly in the scenario where the precision may exceed IEEE-754 64-bit.
thanks JB