apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.18k stars 1.17k forks source link

Parse real number literals as the Decimal type #12817

Open doupache opened 2 weeks ago

doupache commented 2 weeks ago

Is your feature request related to a problem or challenge?

During the fix for 12655, I found that the root cause was that we parse real number literals as f64. When performing coercion between f32 and f64, it can lead to significant precision problems.

As a solution, I proposed changing the default behavior to parse real number literals as Decimal. This not only helps avoid the precision issues between f32 and f64, but also aligns the default behavior with Postgres and DuckDB.

Postgres

SELECT 1.3 as real , pg_typeof(1.3) as type;  
real type
1.3 numeric

DuckDB

SELECT 1.3 as real , typeof(1.3) as type;  
real type
1.3 DECIMAL(2,1)

DataFusion

SELECT 1.3 as real , arrow_typeof(1.3) as type;  
real type
1.3 Float64

Fortunately, we have excellent support for Decimal, so the only change we need to make is setting the default value of sql_parser.parse_float_as_decimal to true. However, I believe this change could be breaking, as it alters the current behavior.

Therefore, it would be important to get broader community consensus before proceeding.

cc @alamb @jayzhan211 @jonahgao

Describe the solution you'd like

change sql_parser.parse_float_as_decimal to true

Describe alternatives you've considered

No response

Additional context

No response

andygrove commented 2 weeks ago

+1 for this. The current behavior is not consistent with ANSI SQL.

jonahgao commented 2 weeks ago

Makse sense to me. We should support scientific notation before this.

DataFusion CLI v42.0.0
> set datafusion.sql_parser.parse_float_as_decimal=true;
0 row(s) fetched.
Elapsed 0.001 seconds.

> select 1.1e10;
SQL error: ParserError("Cannot parse 11e10 as i128 when building decimal: invalid digit found in string")

Instead of parsing manually, I think we can use libraries like BigDecimal, or utils from arrow-rs.

austin362667 commented 2 weeks ago

+1 for this. Makes sense to me, and so is supporting scientific notation.

jayzhan211 commented 2 weeks ago

I think the challenge is that we need to make sure decimal is supported, so when we switch from float to decimal, every functions works as expected as well. Existing test might not covered all the types