cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.84k stars 3.77k forks source link

raft: make commit index delivery in MsgApp reliable #125266

Closed pav-kv closed 6 days ago

pav-kv commented 3 months ago

Raft leader sends the commit index to followers via 2 channels:

The MsgApp channel does not entirely provide liveness for the follower commit index updates. The leader heuristically checks that the follower's commit index can be outdated, but there is no feedback from the follower confirming that the commit index was bumped. If the latest MsgApp is dropped, and the follower commit index is not updated, the leader will not necessarily send another one.

Heartbeats carry Commit index too, and happen regularly, so they help closing this liveness gap in cases when MsgApp delivery is lossy.

We want to make the MsgApp channel reliable, and remove the dependency on heartbeats. In order to do that, we need to introduce feedback from the follower, and leader to track something equivalent to the durable Match log index (but for the Commit index). The leader won't stop sending MsgApp pings to a (connected) follower until it knows that the latest commit index is durable on it.

Jira issue: CRDB-39357

Epic CRDB-39898

blathers-crl[bot] commented 3 months ago

cc @cockroachdb/replication