Open sheaffej opened 3 years ago
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB docs!
John Sheaffer (sheaffej) commented:
Administrators of CockroachDB often want to understand how the software decides to rebalance replicas between nodes. We should consider adding this to the public documentation.
The high level behavior of the rebalance logic was discussed in an internal CRL Slack thread. https://cockroachlabs.slack.com/archives/CHKQGKYEM/p1622764788166600 A summary of that thread is below.
I propose adding the following description to the documentation.
CockroachDB's rebalancing logic classifies each store into one of 3 buckets regarding how many replicas they contain: Underfull, Mean, Overfull. The system attempts to move replicas from stores that are overfull to stores that are underfull, or alternatively to stores that are at mean fullness. The threshold for classifying a store into one of these buckets is controlled by the cluster setting
kv.allocator.range_rebalance_threshold
which defaults to 5%. Note: this cluster setting should not be changed unless directed by Cockroach Labs Support.The evaluation of stores as overfull, underfull, and mean is performed across stores that are considered equivalent in terms of a range's failure tolerance, and any constraints applied to control replica or leaseholder placement such as replication zones and table locality and survival goals.
As a simplified example, consider a cluster that has 9 nodes across 3 regions where each region has 3 nodes, and each node has a single store that is considered to be equivalent by the rebalancing logic. Let's assume the default range diversity, locality, and survivability constraints exist. In this case, it would be expected that each region would contain a replica of a range, and the replicas in a single region would be balanced across the 3 nodes using the overfull, mean, underfull buckets and rebalancing logic described above.
I think an appropriate place for this would be in the Storage section of the Architecture Reference: https://www.cockroachlabs.com/docs/v21.1/architecture/storage-layer.html
However, it may be good to add it to the See Also of the following pages:
Jira Issue: DOC-1561