akubera / bigdecimal-rs

Arbitrary precision decimal crate for Rust
Other
275 stars 71 forks source link

[Possible Bug] - Display panics for large numbers #108

Closed DanielBauman88 closed 6 months ago

DanielBauman88 commented 1 year ago

Is this intentional? I think it'd be an improvement to avoid panics in display, and simply use a different display format for numbers that can't be displayed with the standard implementation.

EG: the following test will panic

    #[test]
    fn test_big_scale() {
        let min: i64 = i64::MIN;
        let str = &format!("1E{min}");
        BigDecimal::from_str(str).unwrap().to_string();
    }

Will panic with "attempt to subtract with overflow"

Same for example for

    #[test]
    fn test_small_scale() {
        let max: i64 = i64::MAX;
        let str = &format!("1E{}", max);
        BigDecimal::from_str(str).unwrap().to_string();
    }
akubera commented 1 year ago

Ok: edge case comes from subtracting the length of the "decimal" string from the scale, counting how many digits should come before the decimal point: https://github.com/akubera/bigdecimal-rs/blob/2633d23f4aca0cff9e91ed1021a606c3affd4666/src/lib.rs#L1957

Interesting, I wrote this unit test

        let dec = BigDecimal::new(1.into(), stdlib::i64::MAX);
        let expected = format!("1E{}", stdlib::i64::MAX);
        assert_eq!(dec.to_string(), expected);

and got memory allocation of 9223372036854775806 bytes failed

So... that's a bad sign.

akubera commented 1 year ago

It boils down to we don't support exponential syntax? In which case, anything close to i64 min or max (or even the 4GB of i32 min/max) will cause problems.

I don't know what heuristic to use for formatting decimals in exponential-form vs standard-form.

Definitely a bug that needs fixed, but I'm not too worried about it as it's been there for 6 years. It's on the todo list

DanielBauman88 commented 1 year ago

Another issue here which isn't related to the edge case, it's not obvious that display will allocate so much memory for large numbers.

EG - try those examples just below the limit. Display does huge allocations.

    #[test]
    fn test_big_scale() {
        let max: i64 = i64::MAX - 2;
        let str = &format!("1E{}", max);
        BigDecimal::from_str(str).unwrap().to_string();
    }

The to_string call here allocates 9223372036854775805 bytes of memory https://github.com/akubera/bigdecimal-rs/blob/main/src/lib.rs#L1694-L1694

Doesn't seem necessary for displaying. Would be better to have some limit where you display in exponential form.

edit: I see now that you came across the same problem here - https://github.com/akubera/bigdecimal-rs/issues/108#issuecomment-1643004689

ddimaria commented 6 months ago

We're running into the same issue in production when converting a large BigDecimal to a string. Design-wise, the to_string() method should return a result or there should be different ways of displaying the stringified BigDecimal that avoids panics. Are you open to a PR to fix this?

akubera commented 6 months ago

Does the branch in MR #121 fix the issue?

akubera commented 6 months ago

It did not.

This line had a potential out of bounds error if the exponent was near the i64 boundary, and had to be shifted forward or backwards past the boundaries when displaying in scientific notation: https://github.com/akubera/bigdecimal-rs/blob/dbf3b48ad210832d85a2f14a9667a7598dc7466b/src/impl_fmt.rs#L233

Casting the length and exp to i128's will solve that particular issue.

akubera commented 6 months ago

I casted a bunch of i64's to i128's before doing formatting stuff, and added some tests where the exponent is near the i64 boundary:

https://github.com/akubera/bigdecimal-rs/blob/72818b7809066470f47df76442102c45497260a5/src/impl_fmt.rs#L563-L572

Those test round-tripping, so everything rendered in .to_string() is able to be parsed (which wasn't true before).

akubera commented 6 months ago

Where did these issues come up? I know Python limits Decimal exponents to 10^18 (about 60 bits) and Java uses 32-bit a scale so I thought 64 bits would be enough for anybody. But I guess not?

Obviously these were bugs needing to be fixed, but I'm curious where they come up?

ddimaria commented 6 months ago

Is was while importing a user-submitted CSV. It comes in as text, then BigDecimal parses the exponential number successfully (signals that the text was a number). When we serialize the value we get the error. Our temp fix is to manually check for overflows.

akubera commented 6 months ago

Should be fixed in 0.4.3!