Open zenoyang opened 9 months ago
try this set new_planner_agg_stage = 3
try this
set new_planner_agg_stage = 3
Still very slow, our production environment new_planner_agg_stage
is 4.
We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!
Mark
@imay PTAL
Which version are you using? Can you try this pr(https://github.com/StarRocks/starrocks/pull/40549) which can do better for count distinct rewrite strategy.
Which version are you using? Can you try this pr(#40549) which can do better for count distinct rewrite strategy.
Version 3.2.8, yes, we have tried this PR. After turning on the prefer_cte_rewrite
parameter, most count distinct
can be converted to group by + count
execution, and the performance is good enough.
However, there are still very few cases where the plan cannot be rewritten, such as: the case where the distinct column has complex case when. @LiShuMing
Supplement: If rewrite is not possible (multi_distinct_count will still be executed), it is very easy to cause BE OOM. Currently, large query fuse is used to avoid this.
Enhancement
We have a query with multiple
count distinct
indicators and alimit
. Because of thelimit
, the optimizer did not convert it into agroup by + count
query plan, and finally used themulti_distinct_count
function for deduplication. But the performance is too poor and the query times out.For example:
This query is very fast, with BE returning results in tens of seconds.
The query is very slow and the query times out in 50 minutes.
note: event_id is varchar type, cardinality is about 300 million.
profile is as follows:
The perf analysis results are as follows:
The main bottleneck is the
phmap::priv::raw_hash_set<phmap::priv::FlatHashSetPolicystarrocks::SliceWithHash, starrocks::HashOnSliceWithHash, starrocks::EqualOnSliceWithHash, std::allocatorstarrocks::SliceWithHash >::prepare_insert
method inside thestarrocks::DistinctAggregateState<(starrocks::LogicalType)13, (starrocks::LogicalType)13, int>::deserialize_and_merge
method.