cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.86k stars 3.77k forks source link

kvserver: replica rebalancing should not be depended upon for constraint satisfaction #90110

Open kvoli opened 1 year ago

kvoli commented 1 year ago

Is your feature request related to a problem? Please describe.

When there are a correct number of non-voters and voters, however the num_replicas is not matched correctly for constraints, the replicate queue repairs the range by rebalancing.

e.g. when moving between these two configurations, we are dependent on rebalancing to achieve the desired constraints:

   v nv         v nv
a  2 1   -->  a 1 1
b  1 0        b 2 0

This is shown below in the repro that fails below:

https://github.com/kvoli/cockroach/tree/221017.repair-rebalancing

This lack of distinction makes it difficult to separate these components in the replicate queue logic, as replica rebalancing is actually depended upon for zone config constraint satisfaction.

Describe the solution you'd like

Split out the class of rebalancing operations that are "necessary" due to zone constraints and those which occur due to capacity (storage/range count etc).

This should be a separate action in the allocator.

Jira issue: CRDB-20587

kvoli commented 1 year ago

cc @KaiSun314

AlexTalks commented 1 year ago

Commenting to here to note that in the replica replacement case this is made a bit better by #94810. For the case when new or updated constraints need to be applied, this is still an issue however.