Open jackwener opened 1 year ago
Could you elaborate on the expected behaviour here? From the original discussion, regarding this test:
I did an explain plan:
+------------------------------------------------------------+-----------------------------------------------------------------+
| plan_type | plan |
+------------------------------------------------------------+-----------------------------------------------------------------+
| initial_logical_plan | Projection: COUNT(UInt8(1)) |
| | Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]] |
| | Union |
| | Projection: t.a |
| | TableScan: t |
| | Projection: t.a |
| | TableScan: t |
| logical_plan after inline_table_scan | SAME TEXT AS ABOVE |
...
| logical_plan after common_sub_expression_eliminate | SAME TEXT AS ABOVE |
| logical_plan after push_down_projection | Projection: COUNT(UInt8(1)) |
| | Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]] |
| | Union |
| | TableScan: t projection=[a] |
| | TableScan: t projection=[a] |
| logical_plan after inline_table_scan | SAME TEXT AS ABOVE |
...
| logical_plan after push_down_projection | SAME TEXT AS ABOVE |
| logical_plan | Projection: COUNT(UInt8(1)) |
| | Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]] |
| | Union |
| | TableScan: t projection=[a] |
| | TableScan: t projection=[a] |
What would be the expected logical plan, if it's not this?
Hi @Jefffrey, actually I am not sure whether we should do this. It is just something appearing in my mind one day.
(sorry it should be the aggregation functions
but not empty projection
in the title)
For union all
, we can push down the aggregation functions to the child plans. For example,
for a query select count(*) from (select * from t1 union select * from t2)
, we can optimize the logical plan to count each child plan and then sum them together.
Projection: Sum(col1) as COUNT(UInt8(1))
Aggregate: groupBy=[[]], aggr=[[Sum(col1)]]
Union
Projection: COUNT(UInt8(1)) as col1
Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]
TableScan: t projection=[a]
Projection: COUNT(UInt8(1)) as col1
Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]
TableScan: t projection=[a]
And it is similar for min, max, avg ...
Ok I see what you mean now @HaoYang670
I suppose this could be a new logical optimizer rule push_down_aggregate
, could be worth experimenting with I suppose.
_Originally posted by @HaoYang670 in https://github.com/apache/arrow-datafusion/pull/4753#discussion_r1058696036_