cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.04k stars 3.8k forks source link

kvserver: store rebalancer should use change replicas instead of relocate range #82539

Open kvoli opened 2 years ago

kvoli commented 2 years ago

The store rebalancer attempts to balance the qps of stores within the cluster. After first trying lease transfers [code], it will attempt reconfiguring replicas for hot ranges it holds a lease for.

This reconfiguration is currently enacted by calling AdminRelocateRange [code]. This was previously necessary prior to #77841, as RelocateRange would handle transferring the lease in a multi step change, as opposed to ChangeReplicas. This is no longer necessary.

It is likewise desirable to limit bespoke lease transfer logic that doesn't respect lease preferences, as the existing logic also attempts to transfer the lease to the lowest qps replica of the resulting voting replica set.[code].

The store rebalancer should instead call ChangeReplicas [code]. Using the replicate queue as a proxy for this call, similar to TransferLease requests [code].

call graph

image

misc

This will have the affect of fixing the lease preference issue, which isn't respected by the relocate range mentioned here https://github.com/cockroachdb/cockroach/issues/82539.

Jira issue: CRDB-16585

kvoli commented 2 years ago

simplifies #82483

aayushshah15 commented 2 years ago

Note that the StoreRebalancer might decide to rebalance multiple voters through that one AdminRelocateRange call. Have we considered what we're going to do about that?

kvoli commented 2 years ago

AdminChangeReplicas should be able to accommodate this if I understand the semantics correctly. The only issue is where the store rebalancer will remove the voter it holds, including the lease - in which case it won't reach parity here. However, the first phase of the store rebalancer loop does these transfers and I'm hoping removing transfers in the second phase will simplify things.

https://github.com/cockroachdb/cockroach/blob/262dabe03eb50801f89fd5c731449e725958a6fa/pkg/kv/kvserver/replica_command.go#L1003

kvoli commented 2 years ago

updated

should be able to accommodate this ...

Not as simply as I initially thought. Talking to @aayushshah15, change replicas doesn't explicitly have the same guarantees as relocate range in the order it changes the replica set.

https://github.com/cockroachdb/cockroach/blob/babf5a977e3a20b4114c0773e197f070663164bd/pkg/kv/kvserver/replica_command.go#L1030-L1035

However it does seem approx. equivalent in the order with which it executes changes.

The other issue is that the store rebalancer loop does assume that every relocateRange call, it will be transferring the lease away from the current store.

Demonstrated by updating the local accounting: https://github.com/cockroachdb/cockroach/blob/babf5a977e3a20b4114c0773e197f070663164bd/pkg/kv/kvserver/store_rebalancer.go#L349-L350

Then also by one of the loop termination conditions (the other being when there are no actions available).

https://github.com/cockroachdb/cockroach/blob/babf5a977e3a20b4114c0773e197f070663164bd/pkg/kv/kvserver/store_rebalancer.go#L293-L293

I'm running some tests to figure out how equivalent using change replicas with the same set of changes is. Likewise, where the second phase only moves one replica at a time and then loops back to the lease transfer (phase 1) of the loop.