Is your feature request related to a problem? Please describe.
Currently Doris supports colocate table building and colocate join.
But it did not make full use of the advantages of colocate.
For example, when there is an Aggregation Node between Hash Join Node and Scan Node, colocate join cannot be performed.
For another example, Aggregation Node, Sort Node, and Set Operation nodes can also be absorbed by child nodes when the data distribution matches, but they are not currently available.
Describe the solution you'd like
In the case of reasonable data distribution, the upper-level operators can be absorbed by the lower-level operators,
thereby reducing unnecessary network transmission and serialization and deserialization operations.
Describe alternatives you've considered
The main operators for colocate optimization include the following 4 plan node:
Hash Join Node
Set Operation Node
Aggregation Node
Sort Node (in window function)
Step3: colocate aggregation node
Condition: The input partition of aggregation node >= the data partition of child fragment
explain select k1, k2 from test_colocate where k1=1 group by k1, k2;
+-------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 2> `k1` | <slot 3> `k2` |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`test_colocate`.`k1`, `default_cluster:test`.`test_colocate`.`k2` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| UNPARTITIONED |
| |
| 1:AGGREGATE (update finalize) |
| | group by: `k1`, `k2` |
| | |
| 0:OlapScanNode |
| TABLE: test_colocate |
+-------------------------------------------------------------------------------------------------------------------------+
30 rows in set (0.011 sec)
Step4: sort node (in window function)
Condition:
The sort columns of sort node >= the data partition of child fragment
The sort column is k1 and k2.
explain select k1, sum(k2) over(partition by k1 order by k2) from test_colocate;
+-------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 4> <slot 0> | <slot 3> |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 3:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`test_colocate`.`k1`, `default_cluster:test`.`test_colocate`.`k2` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 03 |
| UNPARTITIONED |
| |
| 2:ANALYTIC |
| | functions: [, sum(<slot 5> ), ] |
| | partition by: `k1` |
| | order by: <slot 5> ASC |
| | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
| | |
| 1:SORT |
| | order by: <slot 4> <slot 0> ASC, <slot 5> ASC |
| | offset: 0 |
| | |
| 0:OlapScanNode |
| TABLE: test_colocate |
+-------------------------------------------------------------------------------------------------------------------------+
36 rows in set (0.028 sec)
Colocate set operation node
Condition1: The data partition of all child fragments >= input partition of set operation node
Condition2: If there is no exchange node in child fragment, it will mean that the data hasn't been rehashed.
Condition3: Scan node are all in the same colocate group
The t1, t2, t3 are in the same colocate group and the intersect column id is same as the distributed column id.
explain select id from t1 intersect select id from t2 intersect select id from t3;
+----------------------------------------------------------------------------------+
| Explain String |
+----------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 3> `id` `id` `id` |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`t1`.`id` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 0:INTERSECT |
| | colocate=true |
| | |
| |----2:OlapScanNode |
| | TABLE: t2 |
| | |
| |----3:OlapScanNode |
| | TABLE: t3 |
| | |
| 1:OlapScanNode |
| TABLE: t1 |
+----------------------------------------------------------------------------------+
Additional context
The colocate plan will also have some bad cases.
For example, when the data is skewed, because colocate eliminates the problem of rehash, the query will be more affected by the data skew.
In this case, it is best to give priority to solving the problem of data skew and set a reasonable fragmentation column.
Of course, you can also turn off colocate through session configuration.
Is your feature request related to a problem? Please describe.
Currently Doris supports colocate table building and colocate join. But it did not make full use of the advantages of colocate. For example, when there is an Aggregation Node between Hash Join Node and Scan Node, colocate join cannot be performed. For another example, Aggregation Node, Sort Node, and Set Operation nodes can also be absorbed by child nodes when the data distribution matches, but they are not currently available.
Describe the solution you'd like
In the case of reasonable data distribution, the upper-level operators can be absorbed by the lower-level operators, thereby reducing unnecessary network transmission and serialization and deserialization operations.
Describe alternatives you've considered
The main operators for colocate optimization include the following 4 plan node: Hash Join Node Set Operation Node Aggregation Node Sort Node (in window function)
Step1: create table
Step2: colocate join
Step3: colocate aggregation node Condition: The input partition of aggregation node >= the data partition of child fragment
Step4: sort node (in window function) Condition: The sort columns of sort node >= the data partition of child fragment
Colocate set operation node Condition1: The data partition of all child fragments >= input partition of set operation node Condition2: If there is no exchange node in child fragment, it will mean that the data hasn't been rehashed. Condition3: Scan node are all in the same colocate group
step1: create table
step2: colocate query
The t1, t2, t3 are in the same colocate group and the intersect column
id
is same as the distributed columnid
.Additional context
The colocate plan will also have some bad cases. For example, when the data is skewed, because colocate eliminates the problem of rehash, the query will be more affected by the data skew. In this case, it is best to give priority to solving the problem of data skew and set a reasonable fragmentation column. Of course, you can also turn off colocate through session configuration.