apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.65k stars 3.26k forks source link

[BUG] Join query result is unstable when the predicate is pushed down to exchange node #6525

Open qzsee opened 3 years ago

qzsee commented 3 years ago

Describe the bug

DDL

create table A(a1 bigint,a2 bigint)
ENGINE=OLAP
DISTRIBUTED BY HASH(a1) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

insert into A values(1,1),
                    (2,2),
                    (3,3),
                    (4,4),
                    (5,5);

create table B(b1 bigint,b2 bigint)
ENGINE=OLAP
DISTRIBUTED BY HASH(b1) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

insert into B values(2,1),
                    (3,2),
                    (4,3),
                    (5,4);

create table C(c1 bigint)
ENGINE=OLAP
DISTRIBUTED BY HASH(c1) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

insert into C values(1),
                    (2),
                    (3),
                    (4);

Join query

select 
  t1.a1,
  t2.b2,
  sum(t1.a2) 
from A t1

left join (
  select b1,b2 from B
) t2

on t1.a1=t2.b1

join (
  select c1 from C
) t3

on t2.b2 = t3.c1

group by
t1.a1,
t2.b2

Execute the SQL multiple times

Result1

+------+------+----------------+
| a1   | b2   | sum(`t1`.`a2`) |
+------+------+----------------+
|    3 |    2 |              3 |
|    2 |    1 |              2 |
|    4 |    3 |              4 |
+------+------+----------------+

Result2

+------+------+----------------+
| a1   | b2   | sum(`t1`.`a2`) |
+------+------+----------------+
|    3 |    2 |              3 |
|    2 |    1 |              2 |
|    4 |    3 |              4 |
|    5 |    4 |              5 |
+------+------+----------------+

在doris runtime filter版本之前。

hash join node 会把_probe_expr_ctxs中的谓词组成InPredicate谓词下推到exchange node节点。

InPredicate* in_pred = _pool->add(new InPredicate(node));
RETURN_IF_ERROR(in_pred->prepare(state, _probe_expr_ctxs[i]->root()->type()));
in_pred->add_child(Expr::copy(_pool, _probe_expr_ctxs[i]->root()));

InPredicate谓词添加了一个child,由_probe_expr_ctxs[i]->root() copy过去的。 在我的case中copy的是一个 if(TupleIsNull(2), NULL, <slot 4> region_id) 其中 TupleIsNull(2) 谓词会连带其中的 _tuple_idxs内容也会拷贝过去。 在TupleIsNull prepare的时候没有将_tuple_idxs集合做清空处理,导致在调用TupleIsNullPredicate::get_boolean_val方法的时候,因_tuple_idxs内容不正确,从而导致TupleIsNullPredicate::get_boolean_val方法调用不准确。然后就出现了我遇到的case。

runtime filter版本之前 貌似都存在这个问题

EmmyMiao87 commented 3 years ago

新版本的 runtime filter 取消了这部分逻辑了。所以新版本没问题。 旧版本的 runtime filter 确实有可能有你这个问题。

qzsee commented 3 years ago

新版本的 runtime filter 取消了这部分逻辑了。所以新版本没问题。 旧版本的 runtime filter 确实有可能有你这个问题。

是的,但是新版本的runtime filter,也有IN类型的过滤,这部分同样有复制谓词的过程,如果_tuple_idx不清空,不太清楚有些特别的查询是否继续命中这种bug。