cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.97k stars 3.79k forks source link

kv: reject lease proposal locally when leader not known #118435

Open kvoli opened 8 months ago

kvoli commented 8 months ago

Is your feature request related to a problem? Please describe.

If a replica doesn't know who the leader is (possible just rejoined cluster) we still let the lease acquisition request be proposed, despite knowing it will fail.

https://github.com/cockroachdb/cockroach/blob/59ad05a0e43d4c00307a0011d80e76f50549eb39/pkg/kv/kvserver/replica_proposal_buf.go#L685

This can hang if the replica is behind and is still catching up e.g. just rejoined the cluster.

Describe the solution you'd like

Reject the proposal, similar to the known leader case and return a NotLeaseHolderError.

https://github.com/cockroachdb/cockroach/blob/59ad05a0e43d4c00307a0011d80e76f50549eb39/pkg/kv/kvserver/replica_proposal_buf.go#L703

Additional context Related #118266

Jira issue: CRDB-35731

Epic CRDB-37522

blathers-crl[bot] commented 8 months ago

cc @cockroachdb/replication

erikgrinaker commented 8 months ago

I'm not sure this will work. If the range does not have quorum nor a lease, and we don't attempt to propose a lease, then the replica circuit breakers will never trip and the client will keep retrying indefinitely -- exactly the scenario replica circuit breakers are meant to avoid.

If we are to do this, then we'll have to implement circuit breakers at the DistSender level instead. That may come for free with #105168.

nvanbenschoten commented 7 months ago

If the range does not have quorum nor a lease, and we don't attempt to propose a lease, then the replica circuit breakers will never trip and the client will keep retrying indefinitely -- exactly the scenario replica circuit breakers are meant to avoid.

How does proposing a lease help the range reach a quorum and avoid the indefinite retry? When a replica is in this state where it does not know the leader, it is simultaneously in the process of campaigning and trying to establish or learn leadership. Placing a proposal to sit idle in Replica.mu.proposals and wedging a client doesn't really help anybody. Returning a NotLeaseHolderError allows the client to retry on other replicas, allowing it to search for the leaseholder.

Are you saying that we will need client-side intervention in the case where all replicas are disconnected and no leader can be elected, to avoid an infinite NotLeaseHolderError retry loop? If so, I agree. The DistSender inTransferRetry backoff loop can help this from spinning, but we probably also want to integrate this with client-side circuit breakers.

This does feel like the client's responsibility though, right?

erikgrinaker commented 7 months ago

Yeah, I think that's a better way to handle it.

Today though, we rely on replica circuit breakers to avoid clients getting permanently stuck on an unavailable range (e.g. one that's lost quorum). If we simply remove this lease acquisition without also adding client-side circuit breakers then we're effectively removing that protection, leaving clients to get stuck on unavailable ranges again.

As a short-term, backportable fix I think we should get https://github.com/cockroachdb/cockroach/pull/118737 in now. Then once we have range-level DistSender circuit breakers we can reject these proposals (and consider removing replica circuit breakers entirely). Note that range-level circuit breakers are out of scope for https://github.com/cockroachdb/cockroach/pull/118943, and probably not landing in 24.1 given stability is nearly upon us.

erikgrinaker commented 7 months ago

How does proposing a lease help the range reach a quorum and avoid the indefinite retry? When a replica is in this state where it does not know the leader, it is simultaneously in the process of campaigning and trying to establish or learn leadership.

One additional aspect here is that if the range is quiesced but has lost its leaseholder/leader, we have to wake up the replica so that it can actually hold an election and gain leadership. Proposing a lease acquisition will indirectly do that. If we reject lease acquisitions then we need to make sure we still unquiesce the range -- it's possible that it already does so, given we only reject this in the proposal buffer, but we should check and add a test case for this.

nvanbenschoten commented 7 months ago

One additional aspect here is that if the range is quiesced but has lost its leaseholder/leader, we have to wake up the replica so that it can actually hold an election and gain leadership. Proposing a lease acquisition will indirectly do that. If we reject lease acquisitions then we need to make sure we still unquiesce the range -- it's possible that it already does so, given we only reject this in the proposal buffer, but we should check and add a test case for this.

+1 to all of this. There is a call to maybeUnquiesceLocked in propBuf.flushLocked and some discussion around unquiesceAndWakeLeader in handleRaftReadyRaftMuLocked's call to propBuf.FlushLockedWithRaftGroup. We would need to make sure that we properly handle rejecting the lease request but still unquiescing in both of these paths. From a quick reading of the code, it looks like we do. A test case would be worthwhile.