StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
In the example as follows, count(distinct user_id) can converted into count(user_id) to speedup, but the optimizer fails.
-- prepare data
create table t0 (user_id int, value int) primary key(user_id) properties("replication_num"="1");
insert into t0 values(1,0),(2,1),(3,0),(4,1);
-- Q1
explain costs select case when(value=1) then 'A' else 'B' end as flag, count(distinct user_id) from t0 group by 1;
we can rewrite Q1 as follows when column used by count distinct is key column of primary key table.
-- Q2
select case when(value=1) then 'A' else 'B' end as flag, count(user_id) from t0 group by 1;
It seems that GroupByCountDistinctRewriteRule should be updated to support this conversion.
Enhancement
In the example as follows, count(distinct user_id) can converted into count(user_id) to speedup, but the optimizer fails.
we can rewrite Q1 as follows when column used by count distinct is key column of primary key table.
It seems that GroupByCountDistinctRewriteRule should be updated to support this conversion.