apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.33k stars 1.2k forks source link

Fix Duplicated filters within (filter(TableScan)) plan for unparser #13422

Open Sevenannn opened 1 week ago

Sevenannn commented 1 week ago

Which issue does this PR close?

N/A

Rationale for this change

For query

        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                        c_custkey = o_custkey
                    and o_comment not like '%special%requests%'
        group by
            c_custkey

The logical plan is

+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | BytesProcessedNode                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |   Federated                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|               |  Projection: customer.c_custkey, count(orders.o_orderkey)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|               |   Aggregate: groupBy=[[customer.c_custkey]], aggr=[[count(orders.o_orderkey)]]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|               |     Left Join:  Filter: customer.c_custkey = orders.o_custkey                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|               |       TableScan: customer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|               |       Filter: orders.o_comment NOT LIKE Utf8("%special%requests%")                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |         TableScan: orders, partial_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")]              

The rewritten query will be: SELECT customer.c_custkey, count(orders.o_orderkey) FROM customer LEFT JOIN orders ON ((customer.c_custkey = orders.o_custkey) AND (orders.o_comment NOT LIKE '%special%requests%' AND orders.o_comment NOT LIKE '%special%requests%')) GROUP BY customer.c_custkey

Under the current approach, the filter orders.o_comment NOT LIKE Utf8("%special%requests%") will occur twice in final query, although this has no effect on query result correctness, it brings performance overhead by including duplicated conditions.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

alamb commented 3 days ago

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look