cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.11k stars 3.81k forks source link

raft: handle snapshots with outdated term #127348

Open pav-kv opened 3 months ago

pav-kv commented 3 months ago

Before stepping MsgSnap to raft, we bump its term to the receiver Term (to force through its term checks). This would have been fine (because the snapshot carries committed state and can always be handled), but raft assumes the snapshot was sent by the MsgSnap.Term leader, and updates the state accordingly [1, 2]. Since the MsgSnap term could have been bumped arbitrarily, these transitions can be incorrect: we will falsely believe that the snapshot originator is a leader at a different term.

The lead field is used for a bunch of things:

To fix this case, we need to remove this term bump, and allow raft to handle snapshots at outdated terms. A snapshot can always be applied because it carries committed state that is not reversible; except when it carries a committed state that we already have, but it’s easy to check.

Jira issue: CRDB-40401

Epic CRDB-39898

blathers-crl[bot] commented 3 months ago

Hi @pav-kv, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.