apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.15k stars 1.16k forks source link

Pushdown filters that do not reference unested columns #11016

Closed alamb closed 4 months ago

alamb commented 4 months ago

Maybe we can push down the part of the predicate that does not contain exec_columns? For example in query

select ua from (select unnest(a) as ua, 1 as b from t) where ua > 3 and b > 1;

We can push down b > 1.

Originally posted by @jonahgao in https://github.com/apache/datafusion/pull/10991#pullrequestreview-2128604718

alamb commented 4 months ago

This PR adds a test: https://github.com/apache/datafusion/pull/11017


# Could push the filter (column1 = 2) down below unnest
# https://github.com/apache/datafusion/issues/11016
query TT
explain select uc2, column1 from  (select unnest(column2) as uc2, column1 from v) where uc2 > 3 AND column1 = 2;
----
logical_plan
01)Projection: unnest(v.column2) AS uc2, v.column1
02)--Filter: unnest(v.column2) > Int64(3) AND v.column1 = Int64(2)
03)----Unnest: lists[unnest(v.column2)] structs[]
04)------Projection: v.column2 AS unnest(v.column2), v.column1
05)--------TableScan: v projection=[column1, column2]

The plan could look something like this

logical_plan
01)Projection: unnest(v.column2) AS uc2, v.column1
02)--Filter: unnest(v.column2) > Int64(3)
03)----Unnest: lists[unnest(v.column2)] structs[]
04)------Projection: v.column2 AS unnest(v.column2), v.column1
05)--------Filter: v.column1 = Int64(2)    <-- Note this filter is pushed down
06)----------TableScan: v projection=[column1, column2]