apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.31k stars 3.21k forks source link

[Enhance] use predicate if in bitmap_union_count() can not hit rollup. #7008

Open zbtzbtzbt opened 2 years ago

zbtzbtzbt commented 2 years ago

Search before asking

Version

master

What's Wrong?

简单的说就是在bitmap_union_count中加了if判断后,rollup就命中不了了,导致查询很慢;不加可以命中。

What You Expected?

在bitmap_union_count里使用IF也可以命中rollup(详情见下面的 step 3.1)

How to Reproduce?

复现流程如下:

CREATE TABLE table_1
(
    k1 int NULL,
    k2 int NULL,
    k3 int NULL,
    v1 bitmap BITMAP_UNION NULL
)
AGGREGATE KEY(k1,k2,k3)
DISTRIBUTED BY HASH(k1) BUCKETS 10
rollup (rollup_1(k1,k2,v1))
PROPERTIES("replication_num" = "1");
insert into table_1 select 1 as k1, 2 as k2, 3 as k3, to_bitmap(10) as v1;
insert into table_1 select 2 as k1, 2 as k2, 3 as k3, to_bitmap(10) as v1;

-- result: PREAGGREGATION: OFF. rollup: table_1


- step3.2:bitmap_union_count里不使用IF,则可以命中rollup

EXPLAIN SELECT k1, bitmap_union_count(v1) AS v1_num FROM table_1 GROUP BY k1;
-- result: PREAGGREGATION: ON rollup: rollup_1



### Anything Else?

无

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
EmmyMiao87 commented 2 years ago

Now the selector cannot parse the detail column and aggregate column required by the query from expr So it cannot know the k2 is required detail column and the v1 is required bitmap agg column.

EmmyMiao87 commented 2 years ago

我有一个偏业务的小问题,什么类型的业务查询会有这种 bitmap_union_count(IF(k2=0,v1,NULL)) AS v1_num 需求呢? 能举个具体的业务例子嘛?

zbtzbtzbt commented 2 years ago

这里是用来计算每日新客数目

select
    dt
    ,bitmap_union_count(if(is_day_new_user=1,buy_user_id,null)) as newuser_num -- 新客数
    ,sum(sale_amt) as sale_amt -- 销售额
    ,sum(sale_amt)/bitmap_union_count(buy_user_id) as ARPU -- APRU
    ,sum(sale_amt)/sum(sale_num) as pcs_avg_price -- 件均价
    ,sum(sale_num)/bitmap_union_count(buy_user_id) as user_avg_sale_num -- 人均销售件数
    ,bitmap_union_count(buy_user_id) as buy_user_cnt -- 购买用户数
    ,sum(sale_num) as sale_num -- 销售件数
    ,bitmap_union_count(sale_main_order_id) as main_order_num -- 订单量
from grocery_doris_pdt.topic_pdt_pro_coupon_ord_sku_dt_period_v1
where dt=20211109
    -- 区域筛选
        and net_region_id = 350
    -- 业务模式
group by dt

下面这个sql很丑,看样子是根据用户类型分别计算什么指标

SELECT IF(GROUPING_ID(c_chain)=1,2,c_chain) AS plat_type,-- 0:MT,1:DP,2:ALL
   bitmap_union_count(IF(user_type=0 AND is_jm_user=1,jm_user_pk,NULL)) AS jm_trade_num,
   bitmap_union_count(IF(user_type=1,jm_user_pk,NULL)) AS jm_uv
 FROM dm_aggr_brandka_jmc_user_d
WHERE partition_date BETWEEN '2021-01-01' and '2021-11-01'
 AND pdc_brand_id = 184707
GROUP BY GROUPING SETS ((c_chain),());
zbtzbtzbt commented 2 years ago

还有另一个问题也反馈一下,sum(case when)这种写法,也属于聚合函数里面用表达式 当存在隐式转换时,也可能不会命中rollup,比如从int转化为bigint就命中不了了 猜测转换的原因是怕数据溢出吧

explain
select dt,
      sum(arrived_waybill_cnt) as arrived_waybill_cnt
      , sum(case when delivery_type_second_level_code=2 then arrived_waybill_cnt end) as arrived_waybill_cnt_jiameng
 from bi_peisong.app_rider_model_ctl_sa_view_day
where dt > 20211022 and dt<= 20211109
  and delivery_type_second_level_code in (2,4)
group by dt
EmmyMiao87 commented 2 years ago

还有另一个问题也反馈一下,sum(case when)这种写法,也属于聚合函数里面用表达式 当存在隐式转换时,也可能不会命中rollup,比如从int转化为bigint就命中不了了 猜测转换的原因是怕数据溢出吧

explain
select dt,
      sum(arrived_waybill_cnt) as arrived_waybill_cnt
      , sum(case when delivery_type_second_level_code=2 then arrived_waybill_cnt end) as arrived_waybill_cnt_jiameng
 from bi_peisong.app_rider_model_ctl_sa_view_day
where dt > 20211022 and dt<= 20211109
  and delivery_type_second_level_code in (2,4)
group by dt

是这样的,目前物化视图选择器不支持 sum(expr) 其中 expr 不是一个列的情况。 最主要的原因是,无法判断 expr 中的列物化视图是否满足要求。 比如 bitmap_union_count(if(is_day_new_user=1,buy_user_id,null)) 其中需要检测,is_day_new_user, buy_user_id 是否能正确的从物化视图中读取出。 由于 expr 千变万化,需要有表达式等价这类框架支持才能判断。顾目前都不支持聚合函数中出现表达式的物化视图选择。