delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.35k stars 414 forks source link

Parse Decimal overflow #2974

Closed gruuya closed 6 days ago

gruuya commented 3 weeks ago

Environment

Delta-rs version: 0.20.1

Binding: Rust

Environment:


Bug

What happened: When converting Parquet Statistics::FixedLenByteArray value for a Decimal(scale, precision) to an internal representation based on f64, a rounding error can sometimes lead to the output value whose integer part exceeds the allotted space (i.e. the number of digits is larger than precision - scale).

In turn this will result in an error such as Parser error: parse decimal overflow (1e32) when trying to parse the stats from the logs.

What you expected to happen: The conversion should respect the Decimal's precision/scale (even it means it's slightly less precise than with the overflow).

How to reproduce it: The following cases (in test_stats_scalar_serialization) should pass

            (
                simple_parquet_stat!(
                    Statistics::FixedLenByteArray,
                    FixedLenByteArray::from(vec![
                        75, 59, 76, 168, 90, 134, 196, 122, 9, 138, 34, 63, 255, 255, 255, 255
                    ])
                ),
                Some(LogicalType::Decimal {
                    scale: 6,
                    precision: 38,
                }),
                Value::from(9.999999999999999e31),
            ),
            (
                simple_parquet_stat!(
                    Statistics::FixedLenByteArray,
                    FixedLenByteArray::from(vec![
                        180, 196, 179, 87, 165, 121, 59, 133, 246, 117, 221, 192, 0, 0, 0, 1
                    ])
                ),
                Some(LogicalType::Decimal {
                    scale: 6,
                    precision: 38,
                }),
                Value::from(-9.999999999999999e31),
            ),

as otherwise arrow would raise a parse decimal overflow error for 1e32/-1e32.

More details: Coincidentally, this also revealed a related issue whereby the commit effectively succeeds, meaning the new table version is successfully promoted, but the error is thrown somewhere around running post-commit hooks since the faulty stat gets parsed then.