amazon-ion / ion-python

A Python implementation of Amazon Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
260 stars 51 forks source link

Decimal example where value is being changed when using `loads` #132

Closed oscar26 closed 4 years ago

oscar26 commented 4 years ago

Hi,

I'm writing a little utility for converting Ion objects to Python objects and while testing it, I've found a particular example of a Decimal object not being converted properly:

from decimal import Decimal
from amazon.ion.simpleion import dumps, loads

decimal_val = Decimal('1005826281919371473355538432.5')
ion_value = loads(dumps(decimal_val))
print(ion_value)

1005826281919371473355538432 is printed. Note the missing decimal part. dumps apparently doesn't modify the value as in here:

print(dumps(decimal_val, binary=False, omit_version_marker=True))

The value is printed correctly: 1005826281919371473355538432.5, so the problem may be somewhere inside loads.

I've found this example by using the Hypothesis testing library and there are several other cases where my tests pass just fine.

Additional info:

Thank you for your time and effort!

pbcornell commented 4 years ago

Hi Oscar,

Interesting...looks like the problem occurs when dumps() is using a binary writer:

>>> ''.join('%02x ' % x for x in bytearray(dumps(Decimal('1005826281919371473355538432.5'))))
'e0 01 00 ea 5d c1 20 80 00 00 00 00 00 00 00 00 00 00 '

whereas the bytes should be:

e0 01 00 ea 5d c1 20 80 00 00 00 00 00 00 00 00 00 05

The loss occurs in the call to scaleb() at writer_binary_raw.py#L188. As shown below, various invocations of scaleb() on this Decimal results in loss of the trailing 5:

>>> Decimal('1005826281919371473355538432.5').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.5').scaleb(-1)
Decimal('100582628191937147335553843.2')
>>> Decimal('1005826281919371473355538432.5').scaleb(1)
Decimal('1.005826281919371473355538432E+28')

This does not appear to be a problem for decimals with one fewer digits. E.g., if we drop the digit 2 in the ones position, scaleb() preserves all digits:

>>> Decimal('100582628191937147335553843.5').scaleb(0)
Decimal('100582628191937147335553843.5')
>>> Decimal('100582628191937147335553843.5').scaleb(-1)
Decimal('10058262819193714733555384.35')
>>> Decimal('100582628191937147335553843.5').scaleb(1)
Decimal('1005826281919371473355538435')

Thank you for identifying and reporting this bug. Is it causing any problems for you at this time?

oscar26 commented 4 years ago

Hi Peter,

Indeed, interesting it is. Decided to try your snippet on Python 3.5.9, 3.6.10, 3.8.3 and 3.9.0b2 with no success.

Also, it seems like scaleb() is performing a rounding somewhere:

>>> Decimal('1005826281919371473355538432.0').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.1').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.2').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.3').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.4').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.5').scaleb(0)
Decimal('1005826281919371473355538432')
>>> Decimal('1005826281919371473355538432.6').scaleb(0)
Decimal('1005826281919371473355538433')
>>> Decimal('1005826281919371473355538432.7').scaleb(0)
Decimal('1005826281919371473355538433')
>>> Decimal('1005826281919371473355538432.8').scaleb(0)
Decimal('1005826281919371473355538433')
>>> Decimal('1005826281919371473355538432.9').scaleb(0)
Decimal('1005826281919371473355538433')
>>> Decimal('1005826281919371473355538433.0').scaleb(0)
Decimal('1005826281919371473355538433')

Whereas 1 fewer digit:

>>> Decimal('100582628191937147335553843.0').scaleb(0)
Decimal('100582628191937147335553843.0')
>>> Decimal('100582628191937147335553843.1').scaleb(0)
Decimal('100582628191937147335553843.1')
>>> Decimal('100582628191937147335553843.2').scaleb(0)
Decimal('100582628191937147335553843.2')
>>> Decimal('100582628191937147335553843.3').scaleb(0)
Decimal('100582628191937147335553843.3')
>>> Decimal('100582628191937147335553843.4').scaleb(0)
Decimal('100582628191937147335553843.4')
>>> Decimal('100582628191937147335553843.5').scaleb(0)
Decimal('100582628191937147335553843.5')
>>> Decimal('100582628191937147335553843.6').scaleb(0)
Decimal('100582628191937147335553843.6')
>>> Decimal('100582628191937147335553843.7').scaleb(0)
Decimal('100582628191937147335553843.7')
>>> Decimal('100582628191937147335553843.8').scaleb(0)
Decimal('100582628191937147335553843.8')
>>> Decimal('100582628191937147335553843.9').scaleb(0)
Decimal('100582628191937147335553843.9')
>>> Decimal('100582628191937147335553844.0').scaleb(0)
Decimal('100582628191937147335553844.0')

Which seems odd for a library designed for precision.

I've been checking Python's bug tracker and so far found no similar problem (I'll keep searching). Unfortunately, in case a report needs to be submitted, I don't know how to do it for this to be inspected but I'll look into it.

Thank you for investigating this. No problems so far as we're just in a prototyping stage.

Edit: some thoughts are no longer valid.

oscar26 commented 4 years ago

I guess I've found the issue.

In a past project I was dealing with a precision loss problem that wasn't noticed until fuzz testing was added (Hypothesis) and this reminded me of something we've not checked: decimal contexts.

The precision of the defaultcontext is 28 and when dealing with higher precision numbers decimal will round them (look here) and the specific case I reported here is right on the edge (prec=28).

So, the solution is simply increasing the precision:

>>> from decimal import Decimal, getcontext, localcontext
>>> Decimal('1005826281919371473355538432.5').scaleb(0)
Decimal('1005826281919371473355538432')
>>> len('1005826281919371473355538432.5')
30
>>> getcontext().prec
28
>>> with localcontext() as context:
...     context.prec = 30
...     Decimal('1005826281919371473355538432.5').scaleb(0)
... 
Decimal('1005826281919371473355538432.5')

Now, the above solution works only for this specific case and we gotta account for every Decimal possible with something like this:

>>> from decimal import Decimal, localcontext
>>> dec = Decimal('1005826281919371473355538432.5')
>>> with localcontext() as context:
...     _, digits, exponent = dec.as_tuple()
...     context.prec = len(digits) + abs(exponent)
...     dec.scaleb(0)
... 
Decimal('1005826281919371473355538432.5')

I think adding the number of digits and the absolute value of the exponent gives enough precision to handle the scaleb() operation in every case. Now conversion from Ion Decimal to Python Decimal passes all tests.

In your library, the only occurrence of handling precision I've found is in here tests/test_vectors.py#L84 but this is a test.

Anyways, let me know if this was helpful!

Edit: minor formatting.

pbcornell commented 4 years ago

Aha, good find, makes sense! Thank you for digging deeper and sharing your findings. :)

We welcome pull requests in case you're inclined to submit a fix, otherwise we'll get to it sometime....

pbcornell commented 4 years ago

Resolved by #133, closing