Handle large integer values losslessly?

travisbrown commented 2 years ago

Currently these are formatted with scientific notation (in some cases lossily). For example (from a user JSON object from the Twitter API):

$ xq .id < twitter-test.json 
1.470944601309528e18
$ jq .id < twitter-test.json 
1470944601309528000
$ gojq .id < twitter-test.json 
1470944601309528072

Is this intentional? I'm currently using gojq instead of jq specifically because of how it handles values like this, and the lossless approach seems like it would generally be the least likely to cause issues for users.

MiSawa commented 2 years ago

That is because xq and probably jq and most of other things that treats JSON use the double-precision floating point number to represent a JSON Number. Since double can't represent integers out of [-2^53+1,2^53-1] range precisely, 1470944601309528072, 1.470944601309528e18 and 1470944601309528000 results in the same double number (assuming some rounding mode). gojq does special handling on integers to handle such use-cases, but I dropped that support since

I think most of other things that reads/writes JSON has the same behavior, and users most likely should consider using some other representation (e.g. use string instead of number, unfortunately) to avoid this kind of incompatibility issue.
Difficult to give a good semantics that most people (and me) would agrees on it. e.g. should 2.0 be equal to 2 (big-int)? Probably most people want them to be equal. Should 1.470944601309528e18 be equal to 1470944601309528000 (big-int)? Probably no since 1.470944601309528e18 most likely meant to be some other value. Should 1470944601309528000 in the input treated as a big-int? We don't know if they meant to be that specific integer or it was a result of rounding so don't know the user's intention.

MiSawa commented 2 years ago

Though I see a value of it. Maybe good to do treat integer-looking input as-is as much as possible when it is specified to do so? (related: #93, #82)

travisbrown commented 2 years ago

@MiSawa Thanks for the reply!

For me personally the general principle that I'd prefer in most contexts is that the tool should not change values that the user did not specifically ask to be transformed.

I just learned that this is what jq has done for numeric values for a couple of years in the master branch (although not in the latest official release). For example:

$ jq <<< "0.0001000" 
0.0001000
$ jq <<< '18276318.736187263187638172'
18276318.736187263187638172
$ jq <<< '10000000000000000000000000000000000000012'
10000000000000000000000000000000000000012

(gojq gives the same result for the integral value, but drops the trailing zeros on the first example, and rounds the second.)

MiSawa commented 2 years ago

Ah interesting, they have decimal number calculation introduced, so it's not just preserve user's input as a string but actually treat them as a decimal number with precision given. https://github.com/stedolan/jq/tree/master/src/decNumber

$ ./jq <<< '0.1010e2'
10.10

MiSawa / xq

Handle large integer values losslessly? #152