apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

Optimize planner to avoid excessive schema transform functions #9144

Open comphead opened 4 months ago

comphead commented 4 months ago

Is your feature request related to a problem or challenge?

it was found that the planner calling schema transform functions excessively.

Experiment, lets take a simple query

        let sql = "select a, a + 1 b from (select 1 a union all select 2 a) x";

And increase rows in the x subquery by adding more and more rows, and we can see that schema function grows with every record

rows new_with_metadata merge
2 58 35
3 131 83
4 165 109

IMHO That is not expected, once the plan has built the schema calls should not increase with every new record in the dataset

Describe the solution you'd like

Ideally resolve excessive, more real is to reduce such calls

Describe alternatives you've considered

No response

Additional context

Follow up on https://github.com/apache/arrow-datafusion/pull/9104 investigations

comphead commented 4 months ago

The same basically happens with table

create table t1 as (select 1 a union all select 2 a )      
let sql: &str = "select a, a + 1 b, a+2 from t1";

It seems planner phase like schema transform affects execution phase

comphead commented 3 months ago

Update: reading from table or parquet doesn't cause exponential schema calls growth. But Queries with literals causes like below does

let sql = "select a, a + 1, a+2 b from (select 1 a union all select 2 a union all select 3 a union all select 4 a union all select 5 a union all select 6 a)";

But this is rare and won't be that critical.

However for reading parquet and tables another problem arises with wild card expansion

adding outer select * adds 50 new calls to with_new_metadata per each *

    let sql = "select * from (select * from (select * from (select * from (select * from (select a, a+1, a+2, a+3, a+4, a+5, a+6, a+7, a+8, a+9 from t1)))))";

makes 359 calls