StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.67k stars 1.75k forks source link

Transform count(distinct pk) into count(pk) #50974

Open satanson opened 6 days ago

satanson commented 6 days ago

Enhancement

In the example as follows, count(distinct user_id) can converted into count(user_id) to speedup, but the optimizer fails.

-- prepare data
create table t0 (user_id int, value int) primary key(user_id) properties("replication_num"="1");
insert into t0 values(1,0),(2,1),(3,0),(4,1);
-- Q1
explain costs select case when(value=1) then 'A' else 'B' end as flag, count(distinct user_id) from t0 group by 1;

we can rewrite Q1 as follows when column used by count distinct is key column of primary key table.

-- Q2
select case when(value=1) then 'A' else 'B' end as flag, count(user_id) from t0 group by 1;

It seems that GroupByCountDistinctRewriteRule should be updated to support this conversion.

danielhumanmod commented 6 days ago

Hi @satanson , I can take a look and support this enhancement

satanson commented 5 days ago

@danielhumanmod Ok