Open idegtiarenko opened 1 year ago
Pinging @elastic/es-distributed (Team:Distributed)
After discussing this with a team we decided that we should limit amount of rebalances per node level similar to cluster.routing.allocation.node_concurrent_incoming_recoveries
/ cluster.routing.allocation.node_concurrent_outgoing_recoveries
(with default value of 1 per node) as this is the safest option
Description
cluster.routing.allocation.cluster_concurrent_rebalance
property is limiting the amount of shards that could be rebalanced simultaneously. The default value is 2 what is reasonable for a small amount of shards however it is becoming a bottleneck for a bigger clusters (10+ nodes).Since new desired balance shard allocator is not affected by https://github.com/elastic/elasticsearch/issues/87279 (effectively resolved by https://github.com/elastic/elasticsearch/pull/93977) I believe we should change the default to allow big clusters to rebalance quicker.
The new default could be set to:
cluster.routing.allocation.node_concurrent_recoveries_per_node
). This approach will allow to scale the number with the cluster sizecluster.routing.allocation.node_concurrent_incoming_recoveries
/cluster.routing.allocation.node_concurrent_outgoing_recoveries
. This is the most aggresive option and it may delay the necessary shard movements (such as hot->warm tier migration) due to already ongoing rebalances.