citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.39k stars 662 forks source link

Citus Custom Query Planner is not working like PostgreSQL Query Planner with UNION ALL at all #7175

Open arsi opened 1 year ago

arsi commented 1 year ago

I have a view which has 26 table size UNION ALL (so query is just a SELECT FROM x-1 UNION ALL SELECT FROM x-2 ...)

When I use native PostgreSQL table it uses parallel seq scan. When I have distributed it a with key as distributed table it does 27 Distributed Subplans but runs them in serial. Why can't it work in parallel as well? Using schema-based distribution seem to work (as it is using native query planner) but I would really like to use row-based.

Using: Citus 12.0.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18), 64-bit and PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18), 64-bit

PostgreSQL Query planner:

mydata=# explain analyze select * from x; QUERY PLAN

Gather (cost=1000.00..532907.96 rows=674493 width=2844) (actual time=11.944..10186.177 rows=668574 loops=1) Workers Planned: 5 Workers Launched: 5 -> Parallel Append (cost=0.00..464458.66 rows=134897 width=2846) (actual time=1.116..9351.628 rows=111429 loops=6) -> Parallel Seq Scan on "x-18" (cost=0.00..42767.12 rows=23663 width=2866) (actual time=0.786..6798.925 rows=96043 loops=1) -> Parallel Seq Scan on "x-22" (cost=0.00..37324.45 rows=18384 width=2815) (actual time=1.268..6218.198 rows=73579 loops=1) -> Parallel Seq Scan on "x-19" (cost=0.00..35700.12 rows=23932 width=2856) (actual time=0.057..4686.369 rows=66811 loops=1) -> Parallel Seq Scan on "x-17" (cost=0.00..27775.92 rows=14942 width=2814) (actual time=0.896..3102.783 rows=46319 loops=1) -> Parallel Seq Scan on "x-25" (cost=0.00..27124.91 rows=13891 width=2807) (actual time=3.139..5775.030 rows=43061 loops=1) -> Parallel Seq Scan on "x-5" (cost=0.00..26901.59 rows=12440 width=2907) (actual time=2.006..1719.050 rows=38565 loops=1) -> Parallel Seq Scan on "x-16" (cost=0.00..26202.44 rows=13448 width=2862) (actual time=2.410..3454.751 rows=41689 loops=1) -> Parallel Seq Scan on "x-11" (cost=0.00..24990.08 rows=11799 width=2887) (actual time=4.916..2653.725 rows=36576 loops=1) -> Parallel Seq Scan on "x-23" (cost=0.00..23816.53 rows=10231 width=2813) (actual time=0.548..1665.396 rows=15858 loops=2) -> Parallel Seq Scan on "x-21" (cost=0.00..22987.02 rows=9492 width=2854) (actual time=1.079..707.796 rows=4909 loops=6) -> Parallel Seq Scan on "x-20" (cost=0.00..21928.22 rows=8772 width=2849) (actual time=0.465..997.019 rows=13597 loops=2) -> Parallel Seq Scan on "x-15" (cost=0.00..20817.79 rows=8052 width=2859) (actual time=1.134..1168.647 rows=8320 loops=3) -> Parallel Seq Scan on "x-9" (cost=0.00..18780.08 rows=6071 width=2882) (actual time=0.959..1002.836 rows=18819 loops=1) -> Parallel Seq Scan on "x-26" (cost=0.00..18322.80 rows=6223 width=2794) (actual time=0.656..2637.623 rows=19292 loops=1) -> Parallel Seq Scan on "x-6" (cost=0.00..10806.16 rows=2985 width=2911) (actual time=0.956..419.030 rows=9254 loops=1) -> Parallel Seq Scan on "x-4" (cost=0.00..10573.15 rows=2980 width=2734) (actual time=1.250..736.876 rows=9238 loops=1) -> Parallel Seq Scan on "x-1" (cost=0.00..9289.09 rows=2115 width=2790) (actual time=1.249..290.801 rows=5076 loops=1) -> Parallel Seq Scan on "x-8" (cost=0.00..9137.91 rows=2565 width=2881) (actual time=1.897..330.567 rows=6156 loops=1) -> Parallel Seq Scan on "x-13" (cost=0.00..8737.35 rows=3632 width=2846) (actual time=0.530..558.622 rows=8717 loops=1) -> Parallel Seq Scan on "x-2" (cost=0.00..8004.51 rows=2143 width=2760) (actual time=4.306..362.954 rows=5144 loops=1) -> Parallel Seq Scan on "x-7" (cost=0.00..7129.47 rows=1971 width=2894) (actual time=1.694..240.027 rows=4730 loops=1) -> Parallel Seq Scan on "x-14" (cost=0.00..6679.42 rows=2847 width=2853) (actual time=0.391..535.305 rows=6833 loops=1) -> Parallel Seq Scan on "x-3" (cost=0.00..5843.50 rows=1667 width=2759) (actual time=1.346..292.045 rows=4000 loops=1) -> Parallel Seq Scan on "x-24" (cost=0.00..5529.11 rows=3676 width=2827) (actual time=0.681..818.353 rows=8822 loops=1) -> Parallel Seq Scan on "x-12" (cost=0.00..4616.69 rows=1503 width=2864) (actual time=0.039..180.761 rows=3607 loops=1) -> Parallel Seq Scan on "x-10" (cost=0.00..1998.75 rows=1717 width=2883) (actual time=0.546..147.134 rows=2919 loops=1) Planning Time: 101.069 ms Execution Time: 10213.934 ms (32 rows)

Citus Custom Scan Query Planner:

mydata=> explain analyze select * from x; QUERY PLAN

Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) (actual time=25271.495..25961.113 rows=668574 loops=1) -> Distributed Subplan 1_1 Subplan Duration: 410.39 ms Intermediate Data Size: 16 MB Result destination: Write locally -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=4022) (actual time=136.003..136.827 rows=5076 loops=1) Task Count: 32 Tuple data received from nodes: 13 MB Tasks Shown: One of 32 -> Task Tuple data received from node: 3149 kB Node: host=1.1.1.1 port=5432 dbname=mydata -> Seq Scan on "x-1_102022" "x-1" (cost=0.00..663.46 rows=1158 width=2804) (actual time=0.088..58.356 rows=1158 loops=1) Planning Time: 0.165 ms Execution Time: 63.909 ms Planning Time: 0.000 ms Execution Time: 137.427 ms -> Distributed Subplan 1_2 Subplan Duration: 198.66 ms Intermediate Data Size: 16 MB Result destination: Write locally -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=4022) (actual time=142.401..143.229 rows=5144 loops=1) Task Count: 32 Tuple data received from nodes: 13 MB Tasks Shown: One of 32 -> Task Tuple data received from node: 3520 kB Node: host=1.1.1.1 port=5432 dbname=mydata -> Seq Scan on "x-2_102054" "x-2" (cost=0.00..733.21 rows=1292 width=2777) (actual time=0.079..63.338 rows=1292 loops=1) Planning Time: 0.134 ms Execution Time: 68.751 ms Planning Time: 0.000 ms Execution Time: 143.899 ms -> Distributed Subplan 1_3 ... Distributed Subplan 1_26 -> Distributed Subplan 1_27 Subplan Duration: 24442.69 ms Intermediate Data Size: 2362 MB Result destination: Write locally -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) (actual time=17673.567..18368.563 rows=668574 loops=1) Task Count: 1 Tuple data received from nodes: 1970 MB Tasks Shown: All -> Task Tuple data received from node: 1970 MB Node: host=localhost port=5432 dbname=mydata -> Gather (cost=1000.00..105762.91 rows=26000 width=4022) (actual time=50.153..1047.263 rows=668574 loops=1) Workers Planned: 5 Workers Launched: 5 -> Parallel Append (cost=0.00..102162.91 rows=106829 width=4022) (actual time=764.116..1590.658 rows=111429 loops=6) -> Function Scan on read_intermediate_result intermediate_result_17 (cost=0.00..74263.04 rows=78302 width=4022) (actual time=1316.678..1433.456 rows=96043 loops=1) -> Function Scan on read_intermediate_result intermediate_result_21 (cost=0.00..57176.61 rows=60286 width=4022) (actual time=1031.296..1119.719 rows=73579 loops=1) -> Function Scan on read_intermediate_result intermediate_result_18 (cost=0.00..52058.16 rows=54889 width=4022) (actual time=942.834..1021.694 rows=66811 loops=1) -> Function Scan on read_intermediate_result intermediate_result_16 (cost=0.00..35268.77 rows=37187 width=4022) (actual time=648.570..702.491 rows=46319 loops=1) -> Function Scan on read_intermediate_result intermediate_result_24 (cost=0.00..33186.20 rows=34991 width=4022) (actual time=603.903..653.121 rows=43061 loops=1) -> Function Scan on read_intermediate_result intermediate_result_15 (cost=0.00..32206.30 rows=33958 width=4022) (actual time=534.989..585.173 rows=41689 loops=1) -> Function Scan on read_intermediate_result intermediate_result_10 (cost=0.00..27414.12 rows=28905 width=4022) (actual time=434.296..478.152 rows=36576 loops=1) -> Function Scan on read_intermediate_result intermediate_result_4 (cost=0.00..27145.38 rows=28622 width=4022) (actual time=433.297..476.888 rows=38565 loops=1) -> Function Scan on read_intermediate_result intermediate_result_22 (cost=0.00..24244.07 rows=25563 width=4022) (actual time=412.542..450.160 rows=31717 loops=1) -> Function Scan on read_intermediate_result intermediate_result_20 (cost=0.00..22690.84 rows=23925 width=4022) (actual time=363.916..398.711 rows=29452 loops=1) -> Function Scan on read_intermediate_result intermediate_result_19 (cost=0.00..21044.75 rows=22189 width=4022) (actual time=336.200..367.755 rows=27194 loops=1) -> Function Scan on read_intermediate_result intermediate_result_14 (cost=0.00..19159.83 rows=20202 width=4022) (actual time=308.422..337.299 rows=24961 loops=1) -> Function Scan on read_intermediate_result intermediate_result_25 (cost=0.00..14138.91 rows=14908 width=4022) (actual time=234.564..255.000 rows=19292 loops=1) -> Function Scan on read_intermediate_result intermediate_result_8 (cost=0.00..13473.64 rows=14206 width=4022) (actual time=220.137..239.941 rows=18819 loops=1) -> Function Scan on read_intermediate_result intermediate_result_12 (cost=0.00..6542.35 rows=6898 width=4022) (actual time=92.148..94.095 rows=8717 loops=1) -> Function Scan on read_intermediate_result intermediate_result_5 (cost=0.00..6530.84 rows=6886 width=4022) (actual time=93.769..95.633 rows=9254 loops=1) -> Function Scan on read_intermediate_result intermediate_result_3 (cost=0.00..6399.14 rows=6747 width=4022) (actual time=120.001..121.627 rows=9238 loops=1) -> Function Scan on read_intermediate_result intermediate_result_23 (cost=0.00..6373.47 rows=6720 width=4022) (actual time=132.529..133.768 rows=8822 loops=1) -> Function Scan on read_intermediate_result intermediate_result_13 (cost=0.00..5146.78 rows=5427 width=4022) (actual time=84.388..85.403 rows=6833 loops=1) -> Function Scan on read_intermediate_result intermediate_result_7 (cost=0.00..4354.01 rows=4591 width=4022) (actual time=74.876..75.668 rows=6156 loops=1) -> Function Scan on read_intermediate_result intermediate_result_1 (cost=0.00..3524.35 rows=3716 width=4022) (actual time=68.978..69.630 rows=5144 loops=1) -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..3433.25 rows=3620 width=4022) (actual time=72.051..72.704 rows=5076 loops=1) -> Function Scan on read_intermediate_result intermediate_result_6 (cost=0.00..3313.55 rows=3494 width=4022) (actual time=61.486..62.134 rows=4730 loops=1) -> Function Scan on read_intermediate_result intermediate_result_2 (cost=0.00..2757.53 rows=2908 width=4022) (actual time=61.316..62.016 rows=4000 loops=1) -> Function Scan on read_intermediate_result intermediate_result_11 (cost=0.00..2635.48 rows=2779 width=4022) (actual time=52.626..53.165 rows=3607 loops=1) -> Function Scan on read_intermediate_result intermediate_result_9 (cost=0.00..2114.15 rows=2229 width=4022) (actual time=41.404..41.822 rows=2919 loops=1) Planning Time: 8.388 ms Execution Time: 2547.809 ms Planning Time: 0.000 ms Execution Time: 18587.141 ms Task Count: 1 Tuple data received from nodes: 1970 MB Tasks Shown: All -> Task Tuple data received from node: 1970 MB Node: host=localhost port=5432 dbname=mydata -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..506595.55 rows=534148 width=4022) (actual time=7892.106..8627.884 rows=668574 loops=1) Planning Time: 0.188 ms Execution Time: 9965.010 ms Planning Time: 366.923 ms Execution Time: 78025.057 ms (476 rows)

DS-AdamMilazzo commented 11 months ago

Are you sure they're run serially? There seems to be some overlap in the time ranges in the Citus result, albeit much less...