cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.01k stars 3.79k forks source link

kvclient: route requests via followers when leaseholder is unreachable #93503

Closed erikgrinaker closed 6 months ago

erikgrinaker commented 1 year ago

During e.g. partial network partitions, a SQL gateway may be unable to reach a leaseholder directly, but other nodes can reach it just fine (see internal document). In these cases requests will currently stall in indefinite retry loops (although we plan to implement circuit breakers in #93501).

We stall because when the DistSender is unable to reach the current leaseholder it tries to use one of the other replicas in the range, but these will simply return a NotLeaseHolderError pointing the DistSender right back to the unreachable leaseholder. Instead, when the DistSender has already tried the current leaseholder it could signal this in the request and the follower could try to process the request, e.g. via:

This can be further optimized by e.g. sending the request to all followers in parallel, or keeping statistics about which followers are able to proxy to the leaseholder and how efficiently they can do so.

Jira issue: CRDB-22370

Epic CRDB-25200

nvanbenschoten commented 1 year ago

Redirecting KV requests through even just a single follower replica ("one-hop proxying") allows CRDB to make a reasonably easy-to-understand availability claim. Pending other availability work, ranges should always remain available if a majority of their replicas are connected. This proxying work ensures that load originating from a gateway will remain available as long as it can communicate with any replica in that majority. Since any two majorities overlap, this project will allow us to say that regardless of how a network is (partially) partitioned between replicas, a workload is guaranteed to remain available if it is routed to a gateway that can communicate with at least a majority of the replicas in the range that it is reading from/writing to. Of course, it may be more available than this in many cases, but this would be a useful worst-case guarantee to make. It starts to look like the availability requirement for a leaderless system.

blathers-crl[bot] commented 7 months ago

Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.