apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.33k stars 1.2k forks source link

Datafusion downcasts decimal loosing precision #13492

Open himadripal opened 2 days ago

himadripal commented 2 days ago

Describe the bug

This cast throws an error

select  cast(12345.6789 as decimal(6,2));
Arrow error: Invalid argument error: 1234568 is too large to store in a Decimal128 of precision 6. Max is 999999

But if we cast it to a larger decimal and then cast it down to (6,2), then DataFusion cast works fine and looses precision. As shown below

> select cast( cast(12345.6789 as decimal(24,2)) as  decimal(6,2));
+---------------------+
| Float64(12345.6789) |
+---------------------+
| 1234.56             |
+---------------------+
1 row(s) fetched. 
Elapsed 0.001 seconds.

To Reproduce

try the above select query in datafusion-cli

Expected behavior

It should return null or throw an error as in this example

select  cast(12345.6789 as decimal(6,2));
Arrow error: Invalid argument error: 1234568 is too large to store in a Decimal128 of precision 6. Max is 999999

Additional context

while trying to enable decimal support in comet, Spark returns null in this cast but DataFusion returns a lowered precision value loosing the value of the decimal.

himadripal commented 2 days ago

In Postgres the following query resulted in an error


SELECT CAST(CAST(12345.6789 AS NUMERIC(24, 2)) AS NUMERIC(6, 2));
[22003] ERROR: numeric field overflow Detail: A field with precision 6, scale 2 must round to an absolute value less than 10^4.
findepi commented 1 day ago

But if we cast it to a larger decimal and then cast it down to (6,2), then DataFusion cast works fine and looses precision. As shown below

> select cast( cast(12345.6789 as decimal(24,2)) as  decimal(6,2));
+---------------------+
| Float64(12345.6789) |
+---------------------+
| 1234.56             |
+---------------------+

Losing decimal digits (rounding) in a cast is OK.

However here the cast changes value fundamentally, losing digits before the decimal dot, which is not OK. The original value was ≈12345, the cast result is ≈1234.