Closed lgbo-ustc closed 4 days ago
The physical plan on spark-3.3 is
CHNativeColumnarToRow
+- TakeOrderedAndProjectExecTransformer (limit=100, orderBy=[rnk#5 ASC NULLS FIRST], output=[rnk#5,best_performing#11,worst_performing#12])
+- ^(11) ProjectExecTransformer [rnk#5, i_product_name#66 AS best_performing#11, i_product_name#111 AS worst_performing#12]
+- ^(11) CHBroadcastHashJoinExecTransformer [item_sk#6L], [i_item_sk#90L], Inner, BuildRight, false
:- ^(11) ProjectExecTransformer [rnk#5, item_sk#6L, i_product_name#66]
: +- ^(11) CHBroadcastHashJoinExecTransformer [item_sk#1L], [i_item_sk#45L], Inner, BuildRight, false
: :- ^(11) ProjectExecTransformer [item_sk#1L, rnk#5, item_sk#6L]
: : +- ^(11) CHSortMergeJoinExecTransformer [rnk#5], [rnk#10], Inner, false
: : :- ^(11) SortExecTransformer [rnk#5 ASC NULLS FIRST], false, 0
: : : +- ^(11) ProjectExecTransformer [item_sk#1L, rnk#5]
: : : +- ^(11) FilterExecTransformer ((rnk#5 < 11) AND isnotnull(item_sk#1L))
: : : +- ^(11) WindowExecTransformer [rank(rank_col#2) windowspecdefinition(rank_col#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rnk#5], [rank_col#2 ASC NULLS FIRST]
: : : +- ^(11) SortExecTransformer [rank_col#2 ASC NULLS FIRST], false, 0
: : : +- ^(11) InputIteratorTransformer[item_sk#1L, rank_col#2]
: : : +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=896], [shuffle_writer_type=hash], [OUTPUT] List(item_sk:LongType, rank_col:DecimalType(11,6))
: : : +- ^(6) FilterExecTransformer (isnotnull(rank_col#2) AND (cast(rank_col#2 as decimal(13,7)) > CheckOverflow((promote_precision(cast(0.9 as decimal(11,6))) * promote_precision(Subquery scalar-subquery#4, [id=#222])), DecimalType(13,7))))
: : : : +- Subquery scalar-subquery#4, [id=#222]
: : : : +- CHNativeColumnarToRow
: : : : +- ^(2) ProjectExecTransformer [cast((avg(UnscaledValue(ss_net_profit#142))#116 / 100.0) as decimal(11,6)) AS rank_col#3]
: : : : +- ^(2) HashAggregateTransformer(keys=[ss_store_sk#128L], functions=[avg(UnscaledValue(ss_net_profit#142))], isStreamingAgg=false)
: : : : +- ^(2) InputIteratorTransformer[ss_store_sk#128L, sum#204, count#205L]
: : : : +- ColumnarExchange hashpartitioning(ss_store_sk#128L, 5), ENSURE_REQUIREMENTS, [plan_id=215], [shuffle_writer_type=hash], [OUTPUT] ArrayBuffer(ss_store_sk:LongType, sum:DoubleType, count:LongType)
: : : : +- ^(1) HashAggregateTransformer(keys=[ss_store_sk#128L], functions=[partial_avg(_pre_0#206L)], isStreamingAgg=false)
: : : : +- ^(1) ProjectExecTransformer [ss_store_sk#128L, ss_net_profit#142, UnscaledValue(ss_net_profit#142) AS _pre_0#206L]
: : : : +- ^(1) FilterExecTransformer ((isnotnull(ss_store_sk#128L) AND (ss_store_sk#128L = cast(2 as bigint))) AND isnull(ss_hdemo_sk#126L))
: : : : +- ^(1) NativeFileScan parquet tpcds_pq100.store_sales[ss_hdemo_sk#126L,ss_store_sk#128L,ss_net_profit#142] Batched: true, DataFilters: [isnotnull(ss_store_sk#128L), isnull(ss_hdemo_sk#126L)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/data3/liangjiabiao/docker/local_gluten/spark-3.3.2-bin-hadoop3/s..., PartitionFilters: [], PushedFilters: [IsNotNull(ss_store_sk), IsNull(ss_hdemo_sk)], ReadSchema: struct<ss_hdemo_sk:bigint,ss_store_sk:bigint,ss_net_profit:decimal(7,2)>
: : : +- ^(6) ProjectExecTransformer [ss_item_sk#22L AS item_sk#1L, cast((avg(UnscaledValue(ss_net_profit#44))#114 / 100.0) as decimal(11,6)) AS rank_col#2]
: : : +- ^(6) HashAggregateTransformer(keys=[ss_item_sk#22L], functions=[avg(UnscaledValue(ss_net_profit#44))], isStreamingAgg=false)
: : : +- ^(6) InputIteratorTransformer[ss_item_sk#22L, sum#196, count#197L]
: : : +- ColumnarExchange hashpartitioning(ss_item_sk#22L, 5), ENSURE_REQUIREMENTS, [plan_id=889], [shuffle_writer_type=hash], [OUTPUT] ArrayBuffer(ss_item_sk:LongType, sum:DoubleType, count:LongType)
: : : +- ^(5) HashAggregateTransformer(keys=[ss_item_sk#22L], functions=[partial_avg(_pre_2#216L)], isStreamingAgg=false)
: : : +- ^(5) ProjectExecTransformer [ss_item_sk#22L, ss_net_profit#44, UnscaledValue(ss_net_profit#44) AS _pre_2#216L]
: : : +- ^(5) FilterExecTransformer (isnotnull(ss_store_sk#30L) AND (ss_store_sk#30L = cast(2 as bigint)))
: : : +- ^(5) NativeFileScan parquet tpcds_pq100.store_sales[ss_item_sk#22L,ss_store_sk#30L,ss_net_profit#44] Batched: true, DataFilters: [isnotnull(ss_store_sk#30L)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/data3/liangjiabiao/docker/local_gluten/spark-3.3.2-bin-hadoop3/s..., PartitionFilters: [], PushedFilters: [IsNotNull(ss_store_sk)], ReadSchema: struct<ss_item_sk:bigint,ss_store_sk:bigint,ss_net_profit:decimal(7,2)>
: : +- ^(11) SortExecTransformer [rnk#10 ASC NULLS FIRST], false, 0
: : +- ^(11) ProjectExecTransformer [item_sk#6L, rnk#10]
: : +- ^(11) FilterExecTransformer ((rnk#10 < 11) AND isnotnull(item_sk#6L))
: : +- ^(11) WindowExecTransformer [rank(rank_col#7) windowspecdefinition(rank_col#7 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rnk#10], [rank_col#7 DESC NULLS LAST]
: : +- ^(11) SortExecTransformer [rank_col#7 DESC NULLS LAST], false, 0
: : +- ^(11) InputIteratorTransformer[item_sk#6L, rank_col#7]
: : +- ReusedExchange [item_sk#6L, rank_col#7], ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=896], [shuffle_writer_type=hash], [OUTPUT] List(item_sk:LongType, rank_col:DecimalType(11,6))
: +- ^(11) InputIteratorTransformer[i_item_sk#45L, i_product_name#66]
: +- ColumnarBroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [plan_id=923]
: +- ^(9) FilterExecTransformer isnotnull(i_item_sk#45L)
: +- ^(9) NativeFileScan parquet tpcds_pq100.item[i_item_sk#45L,i_product_name#66] Batched: true, DataFilters: [isnotnull(i_item_sk#45L)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/data3/liangjiabiao/docker/local_gluten/spark-3.3.2-bin-hadoop3/s..., PartitionFilters: [], PushedFilters: [IsNotNull(i_item_sk)], ReadSchema: struct<i_item_sk:bigint,i_product_name:string>
+- ^(11) InputIteratorTransformer[i_item_sk#90L, i_product_name#111]
+- ReusedExchange [i_item_sk#90L, i_product_name#111], ColumnarBroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [plan_id=923]
Two advantages of WindowGroupLimit
Base on the WindowTransform
in CH
, we need to
unbounded preceding
, similar to #6214. This will reduce a lot of required memory.Cannot use WindowTransform
to implement this, have to make a new processor, it should be simpler then WindowTransform
Description
Part of #6067.
For ds q44, the phyiscal plan is
q67 and q70 alse fallback on
WindowGroupLimit