amazon-ion / ion-docs

Source for the GitHub Pages for Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
23 stars 23 forks source link

Add tests for decimal values where exponent is larger than 32 bits #79

Open nakedible opened 5 years ago

nakedible commented 5 years ago

Currently, the Ion specification does not have any limitation on the range of the exponent in the decimal value. Obviously decimal values with very large (or very small) exponents are nonsensical but they may cause security vulnerabilities and incompatibilities none the less.

It seems Ion Java currently stores decimal values as BigDecimal in Java, which has a scale parameter which is Java int, meaning value range from -231 to 231-1.

Where as Ion Javascript currently has IonDecimal values which store the exponent as JavaScript number, meaning exact integer value range from -(253-1) to 253-1. The text reader in the implementation allows such values, but the current implementation of binary reader attempts to limit the size of the read exponents to 32 bits, but is implemented incorrectly, so actually limits the size of the exponent to 27 bits.

In cryptographic usage, such as QLDB, it is very important that implementations all treat values the same. For example, if I manage to insert a document in to QLDB that has a large exponent, it is possible that I no longer can verify the hashes using all Ion implementations - or in the worst case, even manage to get differing hashes with differing implementations (if there's no proper overflow checking at every step).

I would suggest amending the Ion specification to explicitly limit range of the exponent in decimal values to be from -227 to 227 for maximum compatibility (and to simplify the implementation of the binary readers so that the exponent is at maximum 4 bytes long).

almann commented 5 years ago

Technically this feedback is not unique to the decimal type. Consider any Ion value that can be larger than an implementation might limit. For example, what about an string that is larger than 2GB? We've been hesitant to limit Ion data in the specification, because today's limitation is potentially tomorrow's use case, but it is clear that some guidelines around minimal implementation might be warranted.

The other problem is JSON compatibility, there is no limit on the number type (nevermind what number means) in this way. E.g. in Python 3.7, this is quite valid:

>>> from decimal import *
>>> from json import *
>>> >>> loads('[1e123456789123456789]', parse_float=Decimal)
[Decimal('1E+123456789123456789')]

Whereas Java does not support this in its BigDecimal type:

$ jshell
|  Welcome to JShell -- Version 11.0.4
|  For an introduction type: /help intro

jshell> import java.math.*;

jshell> new BigDecimal("1e123456789123456789");
|  Exception java.lang.NumberFormatException: Too many nonzero exponent digits.
|        at BigDecimal.parseExp (BigDecimal.java:666)
|        at BigDecimal.<init> (BigDecimal.java:579)
|        at BigDecimal.<init> (BigDecimal.java:401)
|        at BigDecimal.<init> (BigDecimal.java:834)
|        at (#2:1)

So I think we should be a bit careful here in saying the above is invalid Ion. Just because Java chose to use signed int32 for the exponent, shouldn't force the Ion spec to limit decimal. Rust and C# uses decimal128, so is that a better constraint? I am not sure. Any implementation should be able to encode such values or retain them for full fidelity outside of the native platform limitations.

Implementations using Ion, should be able to constrain what they support in reasonable ways. Perhaps it means we should define what limits are reasonable for an implementation, but such specification will always be an RFC-style may or should, which is probably weaker than you'd like.

nakedible commented 5 years ago

It is a fair point to leave this part open ended in the Ion specification, like numbers in JSON.

However, for cryptographic usage in block chains, such as Amazon QLDB, there should be a clearly defined Ion subset which would be supported in every implementation, and Ion Hash should produce the same value in every implementation.

Ultimately, for such usage, there needs to be a subset of Ion Hash for which every byte sequence of inputs will produce the same output for every existing implementation of Ion Hash - either the same hash or an error.

Otherwise, like explained above, it is possible to insert a value in the block chain that makes digest validation impossible for some other implementations. This could even cause problems inside QLDB itself, if one part of it would be implemented using Rust and another using Java.

But, it is totally up to you what you wish to decide on this - I've just brought the issue up.

And originally I raised this in ion-tests, because I think there should be test cases for this regardless of what you decide, because it is easy to see there might be bugs in corner cases like these, and I don't think Ion should end up like http://seriot.ch/json/pruned_results.png