Open jonathanc-n opened 2 days ago
@jonathanc-n, @goldmedal - thank you, I've reviewed this change and it seems it brings back the following issue (there is additional context of why filtering added this way produces incorrect result) https://github.com/apache/datafusion/pull/13132 . I really like the change but can we improve this to see if we can wrap TableScan with Filter as a subquery when it is required
Example query.
select
c_custkey,
count(o_orderkey)
from
customer left join orders on c_custkey = o_custkey and o_comment not like '%special%requests%'
group by
c_custkey
| | Projection: customer.c_custkey, count(orders.o_orderkey) |
| | Aggregate: groupBy=[[customer.c_custkey]], aggr=[[count(orders.o_orderkey)]] |
| | Left Join: Filter: customer.c_custkey = orders.o_custkey |
| | TableScan: customer projection=[c_custkey] |
| | TableScan: orders projection=[o_orderkey, o_custkey], full_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")]
Result:
1489 29
1269 0
652 24
273 0
51 0
...
select
"customer"."c_custkey",
count("orders"."o_orderkey")
from
"customer"
left join "orders" on
(("customer"."c_custkey" = "orders"."o_custkey")
and "orders"."o_comment" not like '%special%requests%')
group by
"customer"."c_custkey"
Result:
1489 29
1269 0
652 24
273 0
51 0
...
select
"customer"."c_custkey",
count("orders"."o_orderkey")
from
"customer"
left join "orders" on
("customer"."c_custkey" = "orders"."o_custkey")
where
"orders"."o_comment" not like '%special%requests%'
group by
"customer"."c_custkey"
Result
1489 29
652 24
1091 1
70 15
839 14
If intent for filter to be moved it must be wrapped as subquery in this case:
select
c_custkey,
count(o_orderkey)
from
customer left join (select * from orders where o_comment not like '%special%requests%') on c_custkey = o_custkey
group by
c_custkey
Result
1489 29
1269 0
652 24
273 0
51 0
Which issue does this PR close?
Closes #13156 .
Rationale for this change
What changes are included in this PR?
Pushes down filter to tablescan instead of having it apply on the join in unparsing.
Are these changes tested?
Changed the previous test
Are there any user-facing changes?