citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.53k stars 667 forks source link

Rebalancing by disk size takes long to plan no moves #7021

Open JelteF opened 1 year ago

JelteF commented 1 year ago
select get_rebalance_table_shards_plan(rebalance_strategy:='by_disk_size');
 get_rebalance_table_shards_plan
---------------------------------
(0 rows)

Time: 11154.771 ms (00:11.155)

All of this time seems to be spent in getting the sizes of shard groups. It's doing one query per shard group to get the size of the complete shard group. We could optimize this by fetching sizes of all shards/shard groups from a node in one go. Probably by using citus_shard_sizes().

onderkalaci commented 1 year ago

this is even the case for the background rebalancer citus_rebalance_start

JelteF commented 1 year ago

yes that's correct. The planning itself is not done in the background, only the execution. Still I don't think it's a huge issue, since those 11 seconds were with ~8000 shard groups.