databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.71k stars 732 forks source link

Feature: TopN window operator #16394

Open sundy-li opened 3 weeks ago

sundy-li commented 3 weeks ago

Summary

Description for this feature.

Example Query:

select * from (select number, rank() over( partition by number % 3   order by number ) c  from   numbers(1000000) ) where c < 3;

🐳 :) explain  pipeline select * from (select number, rank() over( partition by number % 3   order by number ) c  from   numbers(1000000) ) where c < 3;
-[ EXPLAIN ]-----------------------------------
CompoundBlockOperator(Project) × 16
  TransformFilter × 16
    Transform Window × 16
      TransformWindowPartitionSort × 16
        TransformWindowPartitionSpillReader × 16
          Merge to Resize × 16
            Merge to TransformWindowPartitionBucket × 1
              TransformWindowPartitionSpillWriter × 16
                TransformWindowPartitionScatter × 16
                  CompoundBlockOperator(Map) × 16
                    NumbersSourceTransform × 16

11 rows explain in 0.044 sec. Processed 0 rows, 0B (0 row/s, 0B/s)

Since we only need Top 3 rank results partitioned by number % 3 and order by number, we can introduce a TopK window operator after CompoundBlockOperator and filter the results in advance.

This optimization also works for Q44 of TPCDS.

sundy-li commented 3 weeks ago

cc @Dousir9 want this optimization.

assigned to @forsaken628