apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

Support unparsing the Value Plan of Array (List) to SQL String #11144

Open goldmedal opened 4 days ago

goldmedal commented 4 days ago

Is your feature request related to a problem or challenge?

In unparser/expr.rs, list scalar values haven't been supported yet.

ScalarValue::FixedSizeList(_a) => not_impl_err!("Unsupported scalar: {v:?}"),
ScalarValue::List(_a) => not_impl_err!("Unsupported scalar: {v:?}"),
ScalarValue::LargeList(_a) => not_impl_err!("Unsupported scalar: {v:?}"),

However, I found some problems with supporting it. Currently, array construction is implemented by a ScalarUDF called make_array. https://github.com/apache/datafusion/blob/ff116c3da69897358f210a3ea944c8e51dcb7b52/datafusion/sql/src/expr/value.rs#L145-L151 This might mean ScalarValue::List is never created from an AST expression node.

If I try to create the logical plan without the default SessionContext, like this:

let sql = r#"select [1,2,3]"#;

println!("SQL: {:?}", sql);
println!("********");
// parse the SQL
let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ...
let ast = Parser::parse_sql(&dialect, sql).unwrap();
let statement = &ast[0];
println!("AST: {:?}", statement);

// create a logical query plan
let context_provider = MyContextProvider::new();
let sql_to_rel = SqlToRel::new(&context_provider);
let plan = match sql_to_rel.sql_statement_to_plan(statement.clone()) {
    Ok(plan) => plan,
    Err(e) => {
        println!("Error: {:?}", e);
        return;
    }
};
// show the planned SQL
let planned = match plan_to_sql(&plan) {
    Ok(sql) => sql,
    Err(e) => {
        println!("Error: {:?}", e);
        return;
    }
};

println!("unparsed to SQL:\n {}", planned);

I will get the error message:

Error: NotImplemented("array_expression feature is disabled, so you should implement the make_array UDF yourself")

Then, I tried to add make_array_udf in MyContextProvider. I can get the unparsed result like this:

unparsed to SQL:
SELECT make_array(1, 2, 3)

However, I think make_array isn't a common function for standard SQL. It may cause some problems when we try to fully push down the unparsed result to a specific data source.

Describe the solution you'd like

Ideally, we can round-trip the array construction in SQL select [1,2,3] through the following steps:

  1. Transform the AST::Expr::Array to ScalarValue::List (currently, it's the make_array scalar function).
  2. Implement the unparsing of ScalarValue::List.

However, I'm not sure why the array expression was disabled and make_array is used instead. Maybe it is due to some performance issues?

Describe alternatives you've considered

Another approach is to try unparsing the make_array() plan to AST::Expr::Array. It's tricky, but I think it could work.

Additional context

No response

yyy1000 commented 3 days ago

I think maybe you can enable array_expressions feature? https://github.com/apache/datafusion#crate-features

goldmedal commented 3 days ago

I think maybe you can enable array_expressions feature? https://github.com/apache/datafusion#crate-features

Thanks @yyy1000 I tried to enable the feature in 39.0.0 and the nightly version.

datafusion = {git = "https://github.com/apache/datafusion", branch = "main", features = ["array_expressions"]}

and

datafusion = {"version" = "39.0.0", features = ["array_expressions"]}

However, it doesn't work. I think array_expressions is for the array function, not the logical plan for the array value.