apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.48k stars 1.01k forks source link

[Epic] A collection of issues to improve planning performance / speed / efficiency #5637

Open alamb opened 1 year ago

alamb commented 1 year ago

This is a collection of tickets related to making DataFusion's planning speed faster. Planning speed is the time from a SQL string being created to when the ExecutionPlan is created

karlovnv commented 9 months ago

Also I'd like to consider replace list in DFSchema by case_insensitive_hashmap or something similar in order to get value with O(1) complexity instead of O(N).

As I understand, now complexity is O(N^2) due two loops of iterations (datafusion_common::dfschema::DFSchema::index_of_column_by_name and datafusion_common::table_reference::TableReference::resolved_eq)

alamb commented 9 months ago

Yes, I think there is a lot of room for improvement (though we need to be careful about taking on crate dependencies that might not have a good long term maintenance story)

alamb commented 3 months ago

Here are some other recent discussions about how to improve planning speed:

alamb commented 2 months ago

An update here. Thanks to a bunch of work by @haohuaijin @matthewmturner @jayzhan211 @peter-toth @jackwener and myself, the planning speed on 38.0.0 is looking to be quite a bit better 20%-700% better in many cases. I am fairly confident there is still another factor of 2 to be had by completing https://github.com/apache/arrow-datafusion/issues/9637, which I expect to complete over the next few weeks

+ critcmp main 37.0.0
group                                         37.0.0                                 main
-----                                         ------                                 ----
logical_aggregate_with_join                   1.05  1271.9±10.23µs        ? ?/sec    1.00  1210.1±16.14µs        ? ?/sec
logical_plan_tpcds_all                        1.07    167.2±1.37ms        ? ?/sec    1.00    156.4±0.88ms        ? ?/sec
logical_plan_tpch_all                         1.01     17.2±0.18ms        ? ?/sec    1.00     17.0±0.15ms        ? ?/sec
logical_select_all_from_1000                  4.84     93.5±0.41ms        ? ?/sec    1.00     19.3±0.10ms        ? ?/sec
logical_select_one_from_700                   1.00   751.6±12.41µs        ? ?/sec    1.06    795.9±8.14µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.06   795.8±10.91µs        ? ?/sec    1.00    750.1±8.38µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.04   764.2±18.21µs        ? ?/sec    1.00   737.4±18.35µs        ? ?/sec
physical_plan_tpcds_all                       1.46       2.2±0.01s        ? ?/sec    1.00   1479.1±3.64ms        ? ?/sec
physical_plan_tpch_all                        1.35    134.5±0.81ms        ? ?/sec    1.00     99.6±0.77ms        ? ?/sec
physical_plan_tpch_q1                         1.43      7.7±0.06ms        ? ?/sec    1.00      5.4±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.38      6.4±0.05ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.24      5.1±0.03ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.25      4.1±0.02ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q13                        1.22      2.7±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.22      3.5±0.02ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.33      5.3±0.02ms        ? ?/sec    1.00      4.0±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.29      4.9±0.03ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q18                        1.33      5.5±0.06ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.29     10.1±0.09ms        ? ?/sec    1.00      7.9±0.05ms        ? ?/sec
physical_plan_tpch_q2                         1.44     12.3±0.09ms        ? ?/sec    1.00      8.5±0.06ms        ? ?/sec
physical_plan_tpch_q20                        1.32      6.4±0.05ms        ? ?/sec    1.00      4.9±0.02ms        ? ?/sec
physical_plan_tpch_q21                        1.41      9.5±0.03ms        ? ?/sec    1.00      6.8±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.29      4.7±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.27      4.2±0.03ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.39      3.4±0.02ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.27      6.1±0.06ms        ? ?/sec    1.00      4.8±0.03ms        ? ?/sec
physical_plan_tpch_q6                         1.17      2.1±0.01ms        ? ?/sec    1.00  1752.6±12.06µs        ? ?/sec
physical_plan_tpch_q7                         1.39      8.7±0.08ms        ? ?/sec    1.00      6.2±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.52     12.2±0.08ms        ? ?/sec    1.00      8.0±0.03ms        ? ?/sec
physical_plan_tpch_q9                         1.53      9.2±0.06ms        ? ?/sec    1.00      6.0±0.05ms        ? ?/sec
physical_select_all_from_1000                 7.42    683.8±1.12ms        ? ?/sec    1.00     92.2±0.49ms        ? ?/sec
physical_select_one_from_700                  1.12      4.2±0.02ms        ? ?/sec    1.00      3.7±0.04ms        ? ?/sec

I compared 37.0.0 (with the tpcds benchmark) on this branch: https://github.com/alamb/arrow-datafusion/tree/alamb/37_bench

Comparison

``` ```shell set -x -e ## This script tests planning speed of 37.0.0 against the speed on planning on main git fetch -p apache git fetch -p alamb # remove old test runs rm -rf target/criterion/ # use a version of 37 with the tpcds benchmarks BRANCH_NAME="37.0.0" git checkout alamb/37_bench git reset --hard alamb/alamb/37_bench cargo update cargo bench --bench sql_planner -- --save-baseline ${BRANCH_NAME} echo "** Comparing to main" git checkout main git reset --hard apache/main cargo update cargo bench --bench sql_planner -- --save-baseline main critcmp main ${BRANCH_NAME} ```

karlovnv commented 2 months ago

@alamb Hi, amazing work have been done! It's became much more speedy.

But it seems that the complexity of algorithms is still O(n^2) Here we have graph avg query execution time over the number of columns:

image
alamb commented 2 months ago

I agree there are still places that are N^2 in the number of columns.

With @haohuaijin 's great work in https://github.com/apache/datafusion/pull/9595 I think adding an index (perhaps computed on demand) to DFSchema might be more tractable to do without causing performance regressions for smaller column counts.

It would be great if someone wanted to give that a try

matthewmturner commented 2 months ago

We recently updated to the latest Datafusion and we've seen our planning time go from ~20ms to ~10ms! Great job on this.

alamb commented 2 months ago

We recently updated to the latest Datafusion and we've seen our planning time go from ~20ms to ~10ms! Great job on this.

That is great to hear --- thanks for the report @matthewmturner

BTW I think there is still significant improvement to be had by completing https://github.com/apache/datafusion/issues/9637. I don't think we'll get it all done by 38.0.0 but I think we'll improve it some more

alamb commented 1 month ago

Current progress

group                                         37.0.0                                 main
-----                                         ------                                 ----
logical_aggregate_with_join                   1.07  1281.7±14.79µs        ? ?/sec    1.00  1198.9±14.54µs        ? ?/sec
logical_plan_tpcds_all                        1.09    172.4±1.92ms        ? ?/sec    1.00    157.5±1.68ms        ? ?/sec
logical_plan_tpch_all                         1.06     17.7±0.24ms        ? ?/sec    1.00     16.7±0.18ms        ? ?/sec
logical_select_all_from_1000                  5.17     96.2±0.48ms        ? ?/sec    1.00     18.6±0.14ms        ? ?/sec
logical_select_one_from_700                   1.00    739.2±8.71µs        ? ?/sec    1.09   809.4±35.23µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.06   797.4±34.31µs        ? ?/sec    1.00    750.7±8.05µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.02    752.9±6.78µs        ? ?/sec    1.00    738.4±9.07µs        ? ?/sec
physical_plan_tpcds_all                       1.65       2.2±0.01s        ? ?/sec    1.00   1340.8±8.03ms        ? ?/sec
physical_plan_tpch_all                        1.54    139.9±1.25ms        ? ?/sec    1.00     90.8±1.36ms        ? ?/sec
physical_plan_tpch_q1                         1.59      8.2±0.06ms        ? ?/sec    1.00      5.1±0.07ms        ? ?/sec
physical_plan_tpch_q10                        1.53      6.6±0.06ms        ? ?/sec    1.00      4.3±0.09ms        ? ?/sec
physical_plan_tpch_q11                        1.36      5.3±0.10ms        ? ?/sec    1.00      3.9±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.42      4.3±0.10ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_tpch_q13                        1.33      2.8±0.04ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.35      3.7±0.07ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.46      5.5±0.09ms        ? ?/sec    1.00      3.7±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.46      5.1±0.08ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.44      5.7±0.09ms        ? ?/sec    1.00      3.9±0.09ms        ? ?/sec
physical_plan_tpch_q19                        1.65     10.3±0.09ms        ? ?/sec    1.00      6.3±0.09ms        ? ?/sec
physical_plan_tpch_q2                         1.62     12.6±0.10ms        ? ?/sec    1.00      7.8±0.11ms        ? ?/sec
physical_plan_tpch_q20                        1.46      6.7±0.09ms        ? ?/sec    1.00      4.6±0.08ms        ? ?/sec
physical_plan_tpch_q21                        1.58      9.9±0.09ms        ? ?/sec    1.00      6.2±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.47      5.0±0.07ms        ? ?/sec    1.00      3.4±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.40      4.4±0.06ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q4                         1.50      3.5±0.06ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
physical_plan_tpch_q5                         1.43      6.4±0.10ms        ? ?/sec    1.00      4.4±0.08ms        ? ?/sec
physical_plan_tpch_q6                         1.37      2.1±0.05ms        ? ?/sec    1.00  1572.0±16.95µs        ? ?/sec
physical_plan_tpch_q7                         1.60      8.9±0.11ms        ? ?/sec    1.00      5.6±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.72     12.5±0.09ms        ? ?/sec    1.00      7.3±0.11ms        ? ?/sec
physical_plan_tpch_q9                         1.71      9.5±0.10ms        ? ?/sec    1.00      5.6±0.10ms        ? ?/sec
physical_select_all_from_1000                 11.52   701.4±1.22ms        ? ?/sec    1.00     60.9±0.32ms        ? ?/sec
physical_select_one_from_700                  1.17      4.2±0.06ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec

50% faster for tpcds and tpch planning

physical_plan_tpcds_all                       1.65       2.2±0.01s        ? ?/sec    1.00   1340.8±8.03ms        ? ?/sec
physical_plan_tpch_all                        1.54    139.9±1.25ms        ? ?/sec    1.00     90.8±1.36ms        ? ?/sec

Note I expect another 30-40% combined savings between https://github.com/apache/datafusion/pull/10356 and https://github.com/apache/datafusion/issues/10209 and https://github.com/apache/datafusion/issues/9873

alamb commented 1 month ago

Here is where we currently stand with planning performance compared to 37 and 38

Highlight: TPC-DS 76% faster planning, TPCH 64% faster

group                                         37.0.0                                 38.0.0                                 main
-----                                         ------                                 ------                                 ----
physical_plan_tpcds_all                       1.76       2.2±0.01s        ? ?/sec    1.06  1322.5±10.01ms        ? ?/sec    1.00  1253.2±10.31ms        ? ?/sec
physical_plan_tpch_all                        1.64    140.6±2.49ms        ? ?/sec    1.04     89.5±1.43ms        ? ?/sec    1.00     85.8±1.53ms        ? ?/sec

Highlight: SELECT * .. with 1000 columns is 11x faster

group                                         37.0.0                                 38.0.0                                 main
-----                                         ------                                 ------                                 ----
physical_select_all_from_1000                 11.37   689.9±1.24ms        ? ?/sec    1.00     60.7±0.29ms        ? ?/sec    1.01     61.4±0.34ms        ? ?/sec
physical_select_one_from_700                  1.17      4.1±0.04ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec    1.01      3.6±0.06ms        ? ?/sec
++ critcmp main 38.0.0 37.0.0
group                                         37.0.0                                 38.0.0                                 main
-----                                         ------                                 ------                                 ----
logical_aggregate_with_join                   1.27  1275.1±14.29µs        ? ?/sec    1.22  1219.9±21.06µs        ? ?/sec    1.00  1003.7±14.68µs        ? ?/sec
logical_plan_tpcds_all                        1.14    171.6±2.25ms        ? ?/sec    1.05    157.8±1.74ms        ? ?/sec    1.00    150.9±1.77ms        ? ?/sec
logical_plan_tpch_all                         1.08     17.9±0.35ms        ? ?/sec    1.02     17.0±0.17ms        ? ?/sec    1.00     16.6±0.18ms        ? ?/sec
logical_select_all_from_1000                  5.05     94.5±0.86ms        ? ?/sec    1.00     18.7±0.10ms        ? ?/sec    1.01     18.9±0.16ms        ? ?/sec
logical_select_one_from_700                   1.00   750.1±11.67µs        ? ?/sec    1.08   810.9±32.07µs        ? ?/sec    1.08   811.4±16.40µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.05   793.2±10.89µs        ? ?/sec    1.01    762.2±8.23µs        ? ?/sec    1.00   756.0±12.71µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.03   762.6±12.18µs        ? ?/sec    1.00   741.5±10.89µs        ? ?/sec    1.00    740.0±7.52µs        ? ?/sec
physical_plan_tpcds_all                       1.76       2.2±0.01s        ? ?/sec    1.06  1322.5±10.01ms        ? ?/sec    1.00  1253.2±10.31ms        ? ?/sec
physical_plan_tpch_all                        1.64    140.6±2.49ms        ? ?/sec    1.04     89.5±1.43ms        ? ?/sec    1.00     85.8±1.53ms        ? ?/sec
physical_plan_tpch_q1                         1.81      8.2±0.08ms        ? ?/sec    1.10      5.0±0.13ms        ? ?/sec    1.00      4.5±0.06ms        ? ?/sec
physical_plan_tpch_q10                        1.65      6.7±0.09ms        ? ?/sec    1.07      4.3±0.06ms        ? ?/sec    1.00      4.0±0.05ms        ? ?/sec
physical_plan_tpch_q11                        1.49      5.3±0.11ms        ? ?/sec    1.07      3.8±0.08ms        ? ?/sec    1.00      3.5±0.07ms        ? ?/sec
physical_plan_tpch_q12                        1.59      4.3±0.07ms        ? ?/sec    1.12      3.0±0.05ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec
physical_plan_tpch_q13                        1.39      2.8±0.04ms        ? ?/sec    1.02      2.1±0.03ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.51      3.6±0.06ms        ? ?/sec    1.12      2.7±0.08ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_plan_tpch_q16                        1.63      5.5±0.07ms        ? ?/sec    1.09      3.7±0.13ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_tpch_q17                        1.57      5.1±0.07ms        ? ?/sec    1.07      3.5±0.10ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.56      5.7±0.10ms        ? ?/sec    1.06      3.9±0.12ms        ? ?/sec    1.00      3.7±0.06ms        ? ?/sec
physical_plan_tpch_q19                        1.93     10.6±0.09ms        ? ?/sec    1.11      6.1±0.14ms        ? ?/sec    1.00      5.5±0.10ms        ? ?/sec
physical_plan_tpch_q2                         1.71     12.8±0.07ms        ? ?/sec    1.03      7.7±0.15ms        ? ?/sec    1.00      7.5±0.12ms        ? ?/sec
physical_plan_tpch_q20                        1.61      6.9±0.13ms        ? ?/sec    1.02      4.4±0.07ms        ? ?/sec    1.00      4.3±0.12ms        ? ?/sec
physical_plan_tpch_q21                        1.69     10.0±0.16ms        ? ?/sec    1.03      6.1±0.12ms        ? ?/sec    1.00      5.9±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.56      4.9±0.13ms        ? ?/sec    1.04      3.3±0.06ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.49      4.4±0.10ms        ? ?/sec    1.07      3.2±0.11ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
physical_plan_tpch_q4                         1.58      3.5±0.05ms        ? ?/sec    1.05      2.3±0.07ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.49      6.3±0.10ms        ? ?/sec    1.05      4.4±0.09ms        ? ?/sec    1.00      4.2±0.07ms        ? ?/sec
physical_plan_tpch_q6                         1.48      2.1±0.05ms        ? ?/sec    1.08  1553.5±30.77µs        ? ?/sec    1.00  1435.2±12.73µs        ? ?/sec
physical_plan_tpch_q7                         1.71      9.1±0.08ms        ? ?/sec    1.05      5.6±0.08ms        ? ?/sec    1.00      5.3±0.07ms        ? ?/sec
physical_plan_tpch_q8                         1.84     12.7±0.17ms        ? ?/sec    1.05      7.3±0.12ms        ? ?/sec    1.00      6.9±0.11ms        ? ?/sec
physical_plan_tpch_q9                         1.86      9.7±0.08ms        ? ?/sec    1.04      5.4±0.10ms        ? ?/sec    1.00      5.2±0.08ms        ? ?/sec
physical_select_all_from_1000                 11.37   689.9±1.24ms        ? ?/sec    1.00     60.7±0.29ms        ? ?/sec    1.01     61.4±0.34ms        ? ?/sec
physical_select_one_from_700                  1.17      4.1±0.04ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec    1.01      3.6±0.06ms        ? ?/sec

Test script

Details

```bash set -x -e ## This script tests planning speed of 37.0.0 against the speed on planning on main git fetch -p apache git fetch -p alamb # remove old test runs rm -rf target/criterion/ # Compare version 38 git checkout 38.0.0 cargo update cargo bench --bench sql_planner -- --save-baseline "38.0.0" # use a version of 37 with the tpcds benchmarks git checkout alamb/37_bench git reset --hard alamb/alamb/37_bench cargo update cargo bench --bench sql_planner -- --save-baseline "37.0.0" echo "** Comparing to main" git checkout main git reset --hard apache/main cargo update cargo bench --bench sql_planner -- --save-baseline main critcmp main "38.0.0" "37.0.0" ```