Closed sundy-li closed 1 week ago
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
This pr pushes rank limit into aggregate partial node.
For example:
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase ORDER BY UserID, SearchPhrase LIMIT 10;
As the group keys may have high cardinality, the HT will resize multiple times to hold the coming keys which is inefficient.
Now we will filter group keys using partial sort with LimitType::Rank(n) to keep top n unique keys.
LimitType::Rank(n)
So we inject a SortPartialTransform before partial agg now.
SortPartialTransform
๐ณ :) explain pipeline SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase ORDER BY UserID, SearchPhrase LIMIT 10; -[ EXPLAIN ]----------------------------------- CompoundBlockOperator(Project) ร 1 LimitTransform ร 1 Merge to MultiSortMerge ร 1 TransformSortSpill ร 16 TransformSortMergeLimit ร 16 SortPartialTransform ร 16 TransformFinalAggregate ร 16 TransformSpillReader ร 16 Merge to Resize ร 16 Merge to TransformPartitionBucket ร 1 TransformAggregateSpillWriter ร 16 TransformPartialAggregate ร 16 SortPartialTransform ร 16 NativeDeserializeDataTransform ร 16 SyncReadNativeDataSource ร 16
Performance in my PC is 2.8 sec --> 0.35 sec.
This change isโ
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This pr pushes rank limit into aggregate partial node.
For example:
As the group keys may have high cardinality, the HT will resize multiple times to hold the coming keys which is inefficient.
Now we will filter group keys using partial sort with
LimitType::Rank(n)
to keep top n unique keys.So we inject a
SortPartialTransform
before partial agg now.Performance in my PC is 2.8 sec --> 0.35 sec.
Tests
Type of change
This change isโ