cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.11k stars 3.81k forks source link

raft: introduce persistent leader term in raft log #122446

Open pav-kv opened 6 months ago

pav-kv commented 6 months ago

The raft log currently does not "remember" the last term of the leader who appended entries to the log. The state of a raft instance currently contains the Term of its latest vote, which might or might not be the leader. This means that the content of the log is not necessarily a prefix of this Term's leader.

The impact of this manifests in multiple ways:

We should introduce a "leader term" field into the state (both the HardState and the in-memory state of the raft log), with the following invariant:

Log.Entry[last].Term <= LeaderTerm <= Term

The LeaderTerm should be updated every time the log accepts an append from a leader. The "leader term" can be used for safety checks on the follower, before advancing the commit index. It can also be used for a simpler async log protocol.

Ultimately, the LeaderTerm is the missing piece of state that makes Raft log equivalent to Paxos acceptor (TODO: link to the doc). The equivalence is that the LeaderTerm of the log is the "max accepted proposal ID" in Paxos.

The introduction of LeaderTerm can be done as:

  1. A start-up migration that initializes LeaderTerm = Log.Entry[last].Term.
  2. Followed by running the code that maintains this field.

Jira issue: CRDB-37894

Epic CRDB-39898

blathers-crl[bot] commented 6 months ago

cc @cockroachdb/replication

nvanbenschoten commented 6 months ago

We should introduce a "leader term" field into the state (both the HardState and the in-memory state of the raft log)

We are also planning to persist Lead into HardState in the near future.

When we do so, we should be sure to not create confusion by also introducing a field called LeaderTerm, which is neither guaranteed to be HardState.Lead's term nor HardState.Term's leader.

pav-kv commented 6 months ago

Naming-wise, I think we would be good with something like AccTerm / AcceptedTerm. This would also align with Paxos terminology. Or LogTerm, meaning that the state of the log is consistent with the leader at this term.

pav-kv commented 6 months ago

@nvanbenschoten The Lead field will be guaranteed to match the Term though, right? We'll either have {Term=t, Lead=0} meaning that we don't know the leader yet, or {Term=t, Lead=n} meaning that we've learned the current-term leader. In both cases, AccTerm <= Term, and only reflects the state of the log.

lyang24 commented 5 months ago

do we need the leader term in Soft and Hard state?

nvanbenschoten commented 5 months ago

The Lead field will be guaranteed to match the Term though, right? We'll either have {Term=t, Lead=0} meaning that we don't know the leader yet, or {Term=t, Lead=n} meaning that we've learned the current-term leader.

Yes, that is correct.