Improve Planning Time - Githubissues

I noticed this case has many aggregations (about 200). When planning an aggregation, the planner wraps a projection for it and creates the corresponding Column for the aggregation expressions. I guess it could be a potential bottleneck 🤔 截圖 2024-10-19 晚上11 53 46

See the flamegraph, I highlight two parts:

purple part: the planner spends much time building a large hash map for columnizing the expressions.
green part: Creating many columns from a schema that invokes many string clones also takes much time.

I did a simple POC #13018 to avoid building the hash map and reduce the clone.

The benchmark result like as below. all-after is optimized. all-before is main branch.

group                                         all-after                              all-before
-----                                         ---------                              ----------
logical_aggregate_with_join                   1.00  1113.2±38.87µs        ? ?/sec    1.01  1125.6±39.87µs        ? ?/sec
logical_plan_tpcds_all                        1.08   225.2±53.43ms        ? ?/sec    1.00    207.8±3.14ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.5±0.50ms        ? ?/sec    1.04     18.1±1.10ms        ? ?/sec
logical_select_all_from_1000                  1.00      9.7±0.24ms        ? ?/sec    1.00      9.7±0.23ms        ? ?/sec
logical_select_one_from_700                   1.00   799.5±17.64µs        ? ?/sec    1.00   797.5±18.42µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   711.7±18.17µs        ? ?/sec    1.00   714.2±26.44µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   685.4±15.69µs        ? ?/sec    1.01   695.2±74.00µs        ? ?/sec
physical_plan_tpcds_all                       1.00  1683.8±19.94ms        ? ?/sec    1.02  1709.8±26.84ms        ? ?/sec
physical_plan_tpch_all                        1.00    106.3±2.50ms        ? ?/sec    1.01    107.5±2.07ms        ? ?/sec
physical_plan_tpch_q1                         1.00      3.6±0.11ms        ? ?/sec    1.01      3.6±0.10ms        ? ?/sec
physical_plan_tpch_q10                        1.00      5.1±0.14ms        ? ?/sec    1.01      5.2±0.16ms        ? ?/sec
physical_plan_tpch_q11                        1.00      4.5±0.12ms        ? ?/sec    1.00      4.5±0.16ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.8±0.11ms        ? ?/sec    1.01      3.8±0.19ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.6±0.13ms        ? ?/sec    1.00      2.6±0.06ms        ? ?/sec
physical_plan_tpch_q14                        1.00      3.1±0.08ms        ? ?/sec    1.00      3.2±0.09ms        ? ?/sec
physical_plan_tpch_q16                        1.00      4.6±0.17ms        ? ?/sec    1.00      4.6±0.12ms        ? ?/sec
physical_plan_tpch_q17                        1.00      4.2±0.11ms        ? ?/sec    1.00      4.2±0.11ms        ? ?/sec
physical_plan_tpch_q18                        1.00      4.8±0.13ms        ? ?/sec    1.01      4.8±0.15ms        ? ?/sec
physical_plan_tpch_q19                        1.00      7.8±0.23ms        ? ?/sec    1.01      7.9±0.28ms        ? ?/sec
physical_plan_tpch_q2                         1.01      9.1±0.53ms        ? ?/sec    1.00      9.0±0.26ms        ? ?/sec
physical_plan_tpch_q20                        1.00      5.6±0.16ms        ? ?/sec    1.00      5.6±0.17ms        ? ?/sec
physical_plan_tpch_q21                        1.00      7.3±0.24ms        ? ?/sec    1.02      7.5±0.33ms        ? ?/sec
physical_plan_tpch_q22                        1.00      4.0±0.11ms        ? ?/sec    1.04      4.1±0.35ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.7±0.14ms        ? ?/sec    1.01      3.8±0.11ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.8±0.07ms        ? ?/sec    1.00      2.8±0.06ms        ? ?/sec
physical_plan_tpch_q5                         1.00      5.3±0.15ms        ? ?/sec    1.00      5.3±0.15ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1838.5±47.53µs        ? ?/sec    1.00  1842.0±33.97µs        ? ?/sec
physical_plan_tpch_q7                         1.00      6.9±0.20ms        ? ?/sec    1.01      7.0±0.21ms        ? ?/sec
physical_plan_tpch_q8                         1.00      8.3±0.22ms        ? ?/sec    1.00      8.3±0.25ms        ? ?/sec
physical_plan_tpch_q9                         1.00      6.4±0.19ms        ? ?/sec    1.00      6.3±0.16ms        ? ?/sec
physical_select_aggregates_from_200           1.00     33.7±0.92ms        ? ?/sec    1.17     39.4±0.86ms        ? ?/sec
physical_select_all_from_1000                 1.00     70.3±1.79ms        ? ?/sec    1.01     70.7±1.77ms        ? ?/sec
physical_select_one_from_700                  1.00      4.2±0.11ms        ? ?/sec    1.00      4.2±0.11ms        ? ?/se

Most queries don't be affected but we can see the case physical_select_aggregates_from_200:

group                                         all-after                              all-before
-----                                         ---------                              ----------
physical_select_aggregates_from_200           1.00     33.7±0.92ms        ? ?/sec    1.17     39.4±0.86ms        ? ?/sec

It reduces about 15% execution time.

I think it could be a point that we can improve 🤔

I am still considering whether it is possible to avoid using columnzied_output_exprs.

apache / datafusion

Improve Planning Time #13015

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context