Open pav-kv opened 6 months ago
cc @cockroachdb/replication
We should introduce a "leader term" field into the state (both the HardState and the in-memory state of the raft log)
We are also planning to persist Lead
into HardState in the near future.
When we do so, we should be sure to not create confusion by also introducing a field called LeaderTerm
, which is neither guaranteed to be HardState.Lead
's term nor HardState.Term
's leader.
Naming-wise, I think we would be good with something like AccTerm / AcceptedTerm
. This would also align with Paxos terminology. Or LogTerm
, meaning that the state of the log is consistent with the leader at this term.
@nvanbenschoten The Lead
field will be guaranteed to match the Term
though, right? We'll either have {Term=t, Lead=0}
meaning that we don't know the leader yet, or {Term=t, Lead=n}
meaning that we've learned the current-term leader. In both cases, AccTerm <= Term
, and only reflects the state of the log.
do we need the leader term in Soft and Hard state?
The Lead field will be guaranteed to match the Term though, right? We'll either have {Term=t, Lead=0} meaning that we don't know the leader yet, or {Term=t, Lead=n} meaning that we've learned the current-term leader.
Yes, that is correct.
The raft log currently does not "remember" the last term of the leader who appended entries to the log. The state of a raft instance currently contains the
Term
of its latest vote, which might or might not be the leader. This means that the content of the log is not necessarily a prefix of thisTerm
's leader.The impact of this manifests in multiple ways:
MsgApp
message, or has to cap it at the follower'sMatch
index.We should introduce a "leader term" field into the state (both the
HardState
and the in-memory state of the raft log), with the following invariant:The
LeaderTerm
should be updated every time the log accepts an append from a leader. The "leader term" can be used for safety checks on the follower, before advancing the commit index. It can also be used for a simpler async log protocol.Ultimately, the
LeaderTerm
is the missing piece of state that makes Raft log equivalent to Paxos acceptor (TODO: link to the doc). The equivalence is that theLeaderTerm
of the log is the "max accepted proposal ID" in Paxos.The introduction of
LeaderTerm
can be done as:LeaderTerm = Log.Entry[last].Term
.Jira issue: CRDB-37894
Epic CRDB-39898