akubera / bigdecimal-rs

Arbitrary precision decimal crate for Rust
Other
275 stars 71 forks source link

Incorrect parsing of JSON numbers in v0.4 (Regression) #113

Open probablykasper opened 1 year ago

probablykasper commented 1 year ago

In a JSON file, I have numbers that add up to exactly 50:

"splits": [
  { "share": 16.7 },
  { "share": 16.7 },
  { "share": 16.7 },
],
// ...

Thankfully, when parsing them I verify that they add up.

In v0.3 these numbers were correctly parsed as 16.7, but in v0.4.1 they parse to 16.699999999999999289457264239899814128875732421875.

akubera commented 10 months ago

A solution of this is being discussed in #117. But in general I think JSON specifies that numbers are f64, so using those as storage is not considered safe, and should probably be strings.

For example, in Python:

import json, decimal

j = json.loads("""
{"splits": [
   { "share": 16.7 },
   { "share": 16.7 },
   { "share": 16.7 }
]}
""")
shares = [decimal.Decimal(split['share']) for split in j['splits']]
sum(shares)  # Decimal('50.09999999999999786837179272')

and more to the point:

>>> 16.7*3
50.099999999999994

But with strings, keeping the digits explicit:

j = json.loads("""
{"splits": [
   { "share": "16.7" },
   { "share": "16.7" },
   { "share": "16.7" }
]}
""")
sum(Decimal(split['share']) for split in j['splits'])  # Decimal('50.1')

But I understand: if you trust the text of JSON anyways, the standards really don't matter and you want to deserialize to the objects you expect.

Are you parsing with serde into your own rust structs? Because if I understand correctly, issue #117 would allow you mark your fields with a custom parsing function that parses the JSON fields directly.

probablykasper commented 10 months ago

But in general I think JSON specifies that numbers are f64, so using those as storage is not considered safe, and should probably be strings.

That's not the case, the JSON spec says that any sequence of digits is equally valid:

JSON is agnostic about the semantics of numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.

So it's no different than if you were to intermediately parse as f16. #117 sounds like it would solve it though, so that's great