Do not reset the electionElapsed if the node doesn't grant vote

ahrtr commented 6 months ago

// If the local node receives a MsgVote message with higher term,
// but it doesn't grant the vote; it turns into a follower,
// but it shouldn't reset the electionElapsed, to ensure it
// has higher priority to start a campaign in the next round
// of election. If we reject a node, it's highly likely we
// will reject it again if it immediately campaigns again.
// So it may waste a long time to elect a leader if we reset
// the electionElapsed.

ahrtr commented 6 months ago

cc @pav-kv @erikgrinaker PTAL

joshuazh-x commented 6 months ago

I assume the motivation of this change comes from https://github.com/etcd-io/etcd/issues/17455. However, there might be a potential side effect if we don't reset election timeout.

When such a node does not reset its election timeout, it may quickly start a new campaign and makes the winning candidate step down to follower again before it becomes leader. It can be even worse if the node cannot win election (for example, it does not have sufficient connections to other nodes due to network issue). Then the node may keep stopping other candidates from becoming leader.

ahrtr commented 6 months ago

I assume the motivation of this change comes from etcd-io/etcd#17455.

Yes.

When such a node does not reset its election timeout, it may quickly start a new campaign and makes the winning candidate step down to follower again before it becomes leader.

This PR doesn't change the local node's frequency to campaign. The PR just prevents any node being affected by other unqualified candidate.

This PR only fixes the case when pre-vote isn't enabled. So as long as pre-vote is enabled, there is no any side effect.
The side effect you mentioned only exist when two nodes campaign at the same time. It's resolved by the randomised election timeout.

ahrtr commented 6 months ago

Prefer to the alternative PR https://github.com/etcd-io/raft/pull/169,

169 has better readability than this PR
In this PR, the local node's term has already increased by 1 even it rejects the Vote. So Conceptually it should reset the electionElapsed.

pav-kv commented 6 months ago

@ahrtr

In this PR, the local node's term has already increased by 1 even it rejects the Vote. So Conceptually it should reset the electionElapsed.

Why does it conceptually need to reset the timeout? The timeout behaviours are heuristics, so there is no right or wrong ways. We should consider behaviours and do some calculations to compare heuristics.

Do we need to reset electionTimeout at all in situations when a non-leader state remains non-leader? E.g. when we BecomeFollower, and we were already in StateFollower, just with a lower term. Upd: I guess we do need to sometimes reset it, because of https://github.com/etcd-io/raft/pull/167#issuecomment-1955813036.

ahrtr commented 6 months ago

Upd: I guess we do need to sometimes reset it, because of #167 (comment).

YES, it's one of the reasons why I prefer to https://github.com/etcd-io/raft/pull/169. The other reason is #169 has better readability.

Why does it conceptually need to reset the timeout? The timeout behaviours are heuristics, so there is no right or wrong ways. We should consider behaviours and do some calculations to compare heuristics.

Do we need to reset electionTimeout at all in situations when a non-leader state remains non-leader?

All election time parameters (e.g. randomizedElectionTimeout) are supposed to be bind with each term. If the term changes (increases), then we should reset all such time parameters. At least, this is current implementation. If we don't reset the value, and carry over to next term, we don't know what other side effect it may cause apart from https://github.com/etcd-io/raft/pull/167#issuecomment-1955813036. I agree with "there is no right or wrong ways", but I tend not to take risk for now.

etcd-io / raft

Do not reset the electionElapsed if the node doesn't grant vote #167

169 has better readability than this PR