Open kvoli opened 8 months ago
cc @cockroachdb/replication
I'm not sure this will work. If the range does not have quorum nor a lease, and we don't attempt to propose a lease, then the replica circuit breakers will never trip and the client will keep retrying indefinitely -- exactly the scenario replica circuit breakers are meant to avoid.
If we are to do this, then we'll have to implement circuit breakers at the DistSender level instead. That may come for free with #105168.
If the range does not have quorum nor a lease, and we don't attempt to propose a lease, then the replica circuit breakers will never trip and the client will keep retrying indefinitely -- exactly the scenario replica circuit breakers are meant to avoid.
How does proposing a lease help the range reach a quorum and avoid the indefinite retry? When a replica is in this state where it does not know the leader, it is simultaneously in the process of campaigning and trying to establish or learn leadership. Placing a proposal to sit idle in Replica.mu.proposals
and wedging a client doesn't really help anybody. Returning a NotLeaseHolderError
allows the client to retry on other replicas, allowing it to search for the leaseholder.
Are you saying that we will need client-side intervention in the case where all replicas are disconnected and no leader can be elected, to avoid an infinite NotLeaseHolderError
retry loop? If so, I agree. The DistSender inTransferRetry
backoff loop can help this from spinning, but we probably also want to integrate this with client-side circuit breakers.
This does feel like the client's responsibility though, right?
Yeah, I think that's a better way to handle it.
Today though, we rely on replica circuit breakers to avoid clients getting permanently stuck on an unavailable range (e.g. one that's lost quorum). If we simply remove this lease acquisition without also adding client-side circuit breakers then we're effectively removing that protection, leaving clients to get stuck on unavailable ranges again.
As a short-term, backportable fix I think we should get https://github.com/cockroachdb/cockroach/pull/118737 in now. Then once we have range-level DistSender circuit breakers we can reject these proposals (and consider removing replica circuit breakers entirely). Note that range-level circuit breakers are out of scope for https://github.com/cockroachdb/cockroach/pull/118943, and probably not landing in 24.1 given stability is nearly upon us.
How does proposing a lease help the range reach a quorum and avoid the indefinite retry? When a replica is in this state where it does not know the leader, it is simultaneously in the process of campaigning and trying to establish or learn leadership.
One additional aspect here is that if the range is quiesced but has lost its leaseholder/leader, we have to wake up the replica so that it can actually hold an election and gain leadership. Proposing a lease acquisition will indirectly do that. If we reject lease acquisitions then we need to make sure we still unquiesce the range -- it's possible that it already does so, given we only reject this in the proposal buffer, but we should check and add a test case for this.
One additional aspect here is that if the range is quiesced but has lost its leaseholder/leader, we have to wake up the replica so that it can actually hold an election and gain leadership. Proposing a lease acquisition will indirectly do that. If we reject lease acquisitions then we need to make sure we still unquiesce the range -- it's possible that it already does so, given we only reject this in the proposal buffer, but we should check and add a test case for this.
+1 to all of this. There is a call to maybeUnquiesceLocked
in propBuf.flushLocked
and some discussion around unquiesceAndWakeLeader
in handleRaftReadyRaftMuLocked
's call to propBuf.FlushLockedWithRaftGroup
. We would need to make sure that we properly handle rejecting the lease request but still unquiescing in both of these paths. From a quick reading of the code, it looks like we do. A test case would be worthwhile.
Is your feature request related to a problem? Please describe.
If a replica doesn't know who the leader is (possible just rejoined cluster) we still let the lease acquisition request be proposed, despite knowing it will fail.
https://github.com/cockroachdb/cockroach/blob/59ad05a0e43d4c00307a0011d80e76f50549eb39/pkg/kv/kvserver/replica_proposal_buf.go#L685
This can hang if the replica is behind and is still catching up e.g. just rejoined the cluster.
Describe the solution you'd like
Reject the proposal, similar to the known leader case and return a
NotLeaseHolderError
.https://github.com/cockroachdb/cockroach/blob/59ad05a0e43d4c00307a0011d80e76f50549eb39/pkg/kv/kvserver/replica_proposal_buf.go#L703
Additional context Related #118266
Jira issue: CRDB-35731
Epic CRDB-37522