StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.91k stars 1.79k forks source link

no reason when colocate false #1963

Closed colorfulu closed 2 years ago

colorfulu commented 2 years ago

Steps to reproduce the behavior (Required)

CREATE TABLE t1 ( id int(11) NULL COMMENT "", num int(11) NULL COMMENT "", time date NULL COMMENT "", value varchar(20) NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id, num, time) BUCKETS 3 PROPERTIES ( "replication_num" = "1", "colocate_with" = "group1", "in_memory" = "false", "storage_format" = "V2" ); CREATE TABLE t2 ( id int(11) NULL COMMENT "", num int(11) NULL COMMENT "", time date NULL COMMENT "", value varchar(20) NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id, num, time) BUCKETS 3 PROPERTIES ( "replication_num" = "1", "colocate_with" = "group1", "in_memory" = "false", "storage_format" = "V2" ); insert into t1 values (1,1,'2020-1022','hello'); insert into t2 values (1,1,'2020-1022','hello'); explain select * from t1 left join t2 on t1.id=t2.id; Expected behavior (Required) 4:HASH JOIN
| | join op: LEFT OUTER JOIN (PARTITIONED)
| | hash predicates:
| | colocate: false, reason: xxxxxxxxxxxxxxxxxx
| | equal join conjunct: 1: id = 5: id
| | use vectorized: true

Real behavior (Required) 4:HASH JOIN
| | join op: LEFT OUTER JOIN (PARTITIONED)
| | hash predicates:
| | colocate: false, reason:
| | equal join conjunct: 1: id = 5: id
| | use vectorized: true

StarRocks version (Required) You can get the StarRocks version by executing SQL select current_version() 527e5ae

Seaven commented 2 years ago

In your case, the reason about close colocate join is the columns of join on-predicate can't staisfy DISTRIBUTED KEY of colocate table. And in the new optimizer, it's difficult to show the reason, I think this is a low priority improvement.