Open arulajmani opened 1 year ago
Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this release-blocker affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
After looking through this some more and talking to @arulajmani it appears that this code has existed for a long time. This was likely exposed because we had follower pausing on for this cluster and saw follower pausing around the time of this issue.
This is a good "first project" in the range cache code as the fix is not overly complex but requires writing a good test to expose it.
Describe the problem
When the
DistSender
receives aRangeKeyMismatchErrors
it simply terminates its inner loop (sendToReplicas
) and evicts the range descriptor. In doing so, it updates its range cache with the range descriptors in the error; see:https://github.com/cockroachdb/cockroach/blob/5dd802945591fce48b1e357e4a9f91ec13f82057/pkg/kv/kvclient/kvcoord/dist_sender.go#L1895-L1908
Now, if the error is returned by a replica that's lagging (say it hasn't applied a split or a merge yet), we'll end up needlessly evicting the range descriptor. Worse yet, if on the subsequent retry the sender picks this replica[1] as the first one to route to again, we'll find ourselves in an infinite loop until this replica is caught up.
Proposed solution
We know when a
RangeKeyMismatchError
is originating from a replica that isn't caught up yet, by virtue of the descriptor generation in the first entry of itsRanges
slice. Currently,sendToReplicas
short circuits without peeking into this slice:https://github.com/cockroachdb/cockroach/blob/5dd802945591fce48b1e357e4a9f91ec13f82057/pkg/kv/kvclient/kvcoord/dist_sender.go#L2533-L2536
Maybe it should. That is, it should only terminate its iteration if the
RangeKeyMismatch
error is coming from a replica that has a descriptor generation newer than what's in the range cache.cc @kvoli @andrewbaptist
[1] This can happen if the sender believes this replica has the lease or if it has no lease information and the replica is closest to it.
Jira issue: CRDB-28967