apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.31k stars 1.19k forks source link

Floating value literals without postfix should be parsed as decimal #4072

Open viirya opened 2 years ago

viirya commented 2 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and why for this feature, in addition to the what)

Related to #4024, #4071.

A literal like 0.06 is parsed as double in DataFusion. It causes some counter-intuitive result as 0.06 - 0.01 = 0.049999999, 0.06 + 0.01 = 0.069999999 in DataFusion. This result is correct, though. (If you ask Spark to treat them as double (i.e., 0.06f), you will get same result).

Such literals are parsed as decimal in Spark. I think for floating literals without postfix, we should parse them as decimal in DataFusion

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

andygrove commented 2 years ago

Thanks @viirya I had forgotten about this ... this is the relevant code in sql/planner.rs, for reference

// Parse number in sql string, convert to Expr::Literal
fn parse_sql_number(n: &str) -> Result<Expr> {
    // parse first as i64
    n.parse::<i64>()
        .map(lit)
        // if parsing as i64 fails try f64
        .or_else(|_| n.parse::<f64>().map(lit))
        .map_err(|_| {
            DataFusionError::from(ParserError(format!(
                "Cannot parse {} as i64 or f64",
                n
            )))
        })
}
viirya commented 2 years ago

Yes, I played this a bit yesterday. With some tweak, I can make it parsed as decimal and get 0.06 - 0.01 = 0.05 correctly.

andygrove commented 2 years ago

Here is the spec for SQL numeric literals:

<signed numeric literal> ::=
[ <sign> ] <unsigned numeric literal>

<unsigned numeric literal> ::=
<exact numeric literal>
| <approximate numeric literal>

<exact numeric literal> ::=
<unsigned integer> [ <period> [ <unsigned integer> ] ]
| <period> <unsigned integer>

<sign> ::=
<plus sign>
| <minus sign>

<approximate numeric literal> ::=
<mantissa> E <exponent>

<mantissa> ::=
<exact numeric literal>

<exponent> ::=
<signed integer>

<signed integer> ::=
[ <sign> ] <unsigned integer>

<unsigned integer> ::=
<digit>...
liukun4515 commented 2 years ago

agree with your suggestion that if there is no postfix, we will convert the data to decimal by default.