apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

Error parsing valid SQL with aliases #2867

Open andygrove opened 1 year ago

andygrove commented 1 year ago

Describe the bug I am working on a SQL query fuzzer (see sqlfuzz crate) and I think it just found a bug in DataFusion.

The following SQL works fine in Spark but fails in DataFusion with ParserError("Expected ), found: __t404").

SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
FROM ((
    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
        FROM (test1)) __t398
    INNER JOIN
    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
        FROM (test1)) __t403
    ON __t398.__c395 = __t403.__c401) __t404)
WHERE __t404.__c395 != __t404.__c396

Spark:

scala> spark.sql("""SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
     | FROM ((
     |     (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
     |         FROM (test1)) __t398
     |     INNER JOIN
     |     (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
     |         FROM (test1)) __t403
     |     ON __t398.__c395 = __t403.__c401) __t404)
     | WHERE __t404.__c395 != __t404.__c396""").show
+------+------+------+------+
|__c394|__c395|__c396|__c397|
+------+------+------+------+
+------+------+------+------+

To Reproduce Use sqlfuzz

Expected behavior Query should work

Additional context None

waitingkuo commented 1 year ago

i tried it on sqlparser-rs, it doesn't work

use sqlparser::dialect::GenericDialect;
use sqlparser::parser::Parser;

fn main() {

    let dialect = GenericDialect {}; // or AnsiDialect

    let sql = "SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
                FROM ((
                    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
                        FROM (test1)) __t398
                    INNER JOIN
                    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
                        FROM (test1)) __t403
                    ON __t398.__c395 = __t403.__c401) __t404)
                WHERE __t404.__c395 != __t404.__c396; ";

    let ast = Parser::parse_sql(&dialect, sql).unwrap();

    println!("AST: {:?}", ast);
} 
➜  sp git:(master) ✗ cargo run
   Compiling sp v0.1.0 (/Users/willy/willy/df-workspace/sp)
    Finished dev [unoptimized + debuginfo] target(s) in 0.30s
     Running `target/debug/sp`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParserError("Expected ), found: __t404")', src/main.rs:26:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

looks like the outer alias doesn't work. the following sql works

❯ SELECT __c394, __c395, __c396, __c397 
FROM ((
    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
        FROM (test1)) __t398
    INNER JOIN
    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
        FROM (test1)) __t403
    ON __t398.__c395 = __t403.__c401) )
WHERE __c395 != __c396;
0 rows in set. Query took 0.004 seconds.
waitingkuo commented 1 year ago

@andygrove i've made the pull request to sqlparser-rs to make nestedjoin support alias

I will update this https://github.com/apache/arrow-datafusion/blob/master/datafusion/sql/src/planner.rs#L731 after new sqlparser released

waitingkuo commented 1 year ago

@andygrove the parser error fixed in #3072

now it raised another error

DataFusion CLI v10.0.0
0 rows in set. Query took 0.038 seconds.
SchemaError(FieldNotFound { qualifier: Some("__t404"), name: "__c395", valid_fields: Some(["__t398.__c394", "__t398.__c395", "__t398.__c396", "__t398.__c397", "__t403.__c399", "__t403.__c400", "__t403.__c401", "__t403.__c402"]) })

I feel like this might have the similar issue as #3073