apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
11.93k stars 3.14k forks source link

[Enhancement] support distinct in analytic functions #36878

Open morrySnow opened 3 weeks ago

morrySnow commented 3 weeks ago

Search before asking

Description

create table t(id int, c1 int, c2 double, c3 string) properties('replication_num'='1');

support distinct in analytic functions: count, sum and avg

select c1, c2, avg( distinct c1) over(partition by c2) from t;

Solution

  1. remove restirct of distinct in analytic functions 1.1. https://github.com/apache/doris/blob/543576227db1521e66c2d32a6fa522c1a7a7aa61/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/LogicalPlanBuilder.java#L2140-L2145 1.2. https://github.com/apache/doris/blob/543576227db1521e66c2d32a6fa522c1a7a7aa61/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/LogicalPlanBuilder.java#L2155-L2160
  2. convert functions to distinct one in ExtractAndNormalizeWindowExpression 2.1. count(distinct c1) over(partition c2) to multi_distinct_count(c1) over(partition c2) 2.2. sum(distinct c1) over(partition c2) to multi_distinct_sum(c1) over(partition c2) 2.3. avg(distinct c1) over(partition c2) to cast(multi_distinct_sum(c1) over(partition c2) as double) / cast(multi_distinct_count(c1) over(partition c2) as double)

Are you willing to submit PR?

Code of Conduct

cjj2010 commented 2 weeks ago

I want to try 2.1

morrySnow commented 2 weeks ago

@cjj2010 This issue cannot be split into smaller tasks for solving. Can you handle the entire issue?

cjj2010 commented 2 weeks ago

@cjj2010 This issue cannot be split into smaller tasks for solving. Can you handle the entire issue?

Okay, I am willing to handle the entire issue. The business scenario I am currently facing requires this feature very much

morrySnow commented 2 weeks ago

@cjj2010 Great! I will assign this issue to you