apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.15k stars 1.16k forks source link

Eliminate filter when `pushdown_filters` is enabled #7688

Closed Dandandan closed 1 month ago

Dandandan commented 1 year ago

Is your feature request related to a problem or challenge?

When pushdown_filters is enabled, DF should be able to eliminate the subsequent filter. When enabling the option for tpc-h benchmark, the FilterExec and l_shipdate projection is still present in the plans.

For example query 3 we can see the filter:

FilterExec: l_shipdate@3 > 9204
  ParquetExec: file_groups={2 groups: [[...]]},
    projection=[l_orderkey, l_extendedprice, l_discount, l_shipdate], predicate=l_shipdate@10 > 9204, pruning_predicate=l_shipdate_max@0 > 9204

Describe the solution you'd like

Remove the filter when.

We probably need to make some changes to TableProvider FileFormat to support removing the filter based on the file format.

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 1 month ago

I believe this is a duplicate of https://github.com/apache/datafusion/issues/4028, which @itsjunetime completed in https://github.com/apache/datafusion/pull/12135

Tests are here https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt

Let me know if I got that incorrect