etcd-io / raft

Raft library for maintaining a replicated state machine
Apache License 2.0
630 stars 160 forks source link

rafttest: TestRestart is flaky #181

Closed pav-kv closed 5 months ago

pav-kv commented 6 months ago

TestRestart failed (when run many times) at commit ed26e90 with the following log:

=== RUN   TestRestart
raft2024/03/08 13:35:12 INFO: 1 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 1 became follower at term 0
raft2024/03/08 13:35:12 INFO: newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/08 13:35:12 INFO: 1 became follower at term 1
raft2024/03/08 13:35:12 INFO: 1 switched to configuration voters=(1)
raft2024/03/08 13:35:12 INFO: 1 switched to configuration voters=(1 2)
raft2024/03/08 13:35:12 INFO: 1 switched to configuration voters=(1 2 3)
raft2024/03/08 13:35:12 INFO: 1 switched to configuration voters=(1 2 3 4)
raft2024/03/08 13:35:12 INFO: 1 switched to configuration voters=(1 2 3 4 5)
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 2 became follower at term 0
raft2024/03/08 13:35:12 INFO: newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/08 13:35:12 INFO: 2 became follower at term 1
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=(1)
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=(1 2)
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=(1 2 3)
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=(1 2 3 4)
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=(1 2 3 4 5)
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 3 became follower at term 0
raft2024/03/08 13:35:12 INFO: newRaft 3 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/08 13:35:12 INFO: 3 became follower at term 1
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=(1)
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=(1 2)
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=(1 2 3)
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=(1 2 3 4)
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=(1 2 3 4 5)
raft2024/03/08 13:35:12 INFO: 4 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 4 became follower at term 0
raft2024/03/08 13:35:12 INFO: newRaft 4 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/08 13:35:12 INFO: 4 became follower at term 1
raft2024/03/08 13:35:12 INFO: 4 switched to configuration voters=(1)
raft2024/03/08 13:35:12 INFO: 4 switched to configuration voters=(1 2)
raft2024/03/08 13:35:12 INFO: 4 switched to configuration voters=(1 2 3)
raft2024/03/08 13:35:12 INFO: 4 switched to configuration voters=(1 2 3 4)
raft2024/03/08 13:35:12 INFO: 4 switched to configuration voters=(1 2 3 4 5)
raft2024/03/08 13:35:12 INFO: 5 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 5 became follower at term 0
raft2024/03/08 13:35:12 INFO: newRaft 5 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2024/03/08 13:35:12 INFO: 5 became follower at term 1
raft2024/03/08 13:35:12 INFO: 5 switched to configuration voters=(1)
raft2024/03/08 13:35:12 INFO: 5 switched to configuration voters=(1 2)
raft2024/03/08 13:35:12 INFO: 5 switched to configuration voters=(1 2 3)
raft2024/03/08 13:35:12 INFO: 5 switched to configuration voters=(1 2 3 4)
raft2024/03/08 13:35:12 INFO: 5 switched to configuration voters=(1 2 3 4 5)
raft2024/03/08 13:35:12 INFO: 3 is starting a new election at term 1
raft2024/03/08 13:35:12 INFO: 3 became candidate at term 2
raft2024/03/08 13:35:12 INFO: 3 [logterm: 1, index: 5] sent MsgVote request to 1 at term 2
raft2024/03/08 13:35:12 INFO: 3 [logterm: 1, index: 5] sent MsgVote request to 2 at term 2
raft2024/03/08 13:35:12 INFO: 3 [logterm: 1, index: 5] sent MsgVote request to 4 at term 2
raft2024/03/08 13:35:12 INFO: 3 [logterm: 1, index: 5] sent MsgVote request to 5 at term 2
raft2024/03/08 13:35:12 INFO: 3 received MsgVoteResp from 3 at term 2
raft2024/03/08 13:35:12 INFO: 3 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/08 13:35:12 INFO: 5 [term: 1] received a MsgVote message with higher term from 3 [term: 2]
raft2024/03/08 13:35:12 INFO: 5 became follower at term 2
raft2024/03/08 13:35:12 INFO: 5 [logterm: 1, index: 5, vote: 0] cast MsgVote for 3 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 2 is starting a new election at term 1
raft2024/03/08 13:35:12 INFO: 2 became candidate at term 2
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5] sent MsgVote request to 1 at term 2
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5] sent MsgVote request to 3 at term 2
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5] sent MsgVote request to 4 at term 2
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5] sent MsgVote request to 5 at term 2
raft2024/03/08 13:35:12 INFO: 4 is starting a new election at term 1
raft2024/03/08 13:35:12 INFO: 4 became candidate at term 2
raft2024/03/08 13:35:12 INFO: 4 [logterm: 1, index: 5] sent MsgVote request to 1 at term 2
raft2024/03/08 13:35:12 INFO: 4 [logterm: 1, index: 5] sent MsgVote request to 2 at term 2
raft2024/03/08 13:35:12 INFO: 4 [logterm: 1, index: 5] sent MsgVote request to 3 at term 2
raft2024/03/08 13:35:12 INFO: 4 [logterm: 1, index: 5] sent MsgVote request to 5 at term 2
raft2024/03/08 13:35:12 INFO: 4 received MsgVoteResp from 4 at term 2
raft2024/03/08 13:35:12 INFO: 4 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/08 13:35:12 INFO: 2 received MsgVoteResp from 2 at term 2
raft2024/03/08 13:35:12 INFO: 2 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/08 13:35:12 INFO: 4 [logterm: 1, index: 5, vote: 4] rejected MsgVote from 3 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 3 received MsgVoteResp rejection from 4 at term 2
raft2024/03/08 13:35:12 INFO: 3 has received 1 MsgVoteResp votes and 1 vote rejections
raft2024/03/08 13:35:12 INFO: 5 [logterm: 1, index: 5, vote: 3] rejected MsgVote from 2 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 1 [term: 1] received a MsgVote message with higher term from 3 [term: 2]
raft2024/03/08 13:35:12 INFO: 1 became follower at term 2
raft2024/03/08 13:35:12 INFO: 1 [logterm: 1, index: 5, vote: 0] cast MsgVote for 3 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5, vote: 2] rejected MsgVote from 3 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 3 received MsgVoteResp from 5 at term 2
raft2024/03/08 13:35:12 INFO: 3 has received 2 MsgVoteResp votes and 1 vote rejections
raft2024/03/08 13:35:12 INFO: 1 [logterm: 1, index: 5, vote: 3] rejected MsgVote from 2 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 4 [logterm: 1, index: 5, vote: 4] rejected MsgVote from 2 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 5 [logterm: 1, index: 5, vote: 3] rejected MsgVote from 4 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 3 received MsgVoteResp from 1 at term 2
raft2024/03/08 13:35:12 INFO: 3 has received 3 MsgVoteResp votes and 1 vote rejections
raft2024/03/08 13:35:12 INFO: 3 became leader at term 2
raft2024/03/08 13:35:12 INFO: raft.node: 3 elected leader 3 at term 2
raft2024/03/08 13:35:12 INFO: 1 [logterm: 1, index: 5, vote: 3] rejected MsgVote from 4 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 3 [logterm: 2, index: 6, vote: 3] rejected MsgVote from 4 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: raft.node: 1 elected leader 3 at term 2
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5, vote: 2] rejected MsgVote from 4 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: 3 [logterm: 2, index: 6, vote: 3] rejected MsgVote from 2 [logterm: 1, index: 5] at term 2
raft2024/03/08 13:35:12 INFO: raft.node: 5 elected leader 3 at term 2
raft2024/03/08 13:35:12 INFO: 2 received MsgVoteResp rejection from 5 at term 2
raft2024/03/08 13:35:12 INFO: 2 has received 1 MsgVoteResp votes and 1 vote rejections
2024/03/08 13:35:12 raft.2: stop
raft2024/03/08 13:35:12 INFO: 4 received MsgVoteResp rejection from 5 at term 2
raft2024/03/08 13:35:12 INFO: 4 has received 1 MsgVoteResp votes and 1 vote rejections
raft2024/03/08 13:35:12 INFO: 4 became follower at term 2
raft2024/03/08 13:35:12 INFO: raft.node: 4 elected leader 3 at term 2
2024/03/08 13:35:12 raft.3: stop
raft2024/03/08 13:35:12 INFO: 3 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 3 became follower at term 2
raft2024/03/08 13:35:12 INFO: newRaft 3 [peers: [], term: 2, commit: 5, applied: 0, lastindex: 8, lastterm: 2]
raft2024/03/08 13:35:12 INFO: 2 switched to configuration voters=()
raft2024/03/08 13:35:12 INFO: 2 became follower at term 2
raft2024/03/08 13:35:12 INFO: newRaft 2 [peers: [], term: 2, commit: 5, applied: 0, lastindex: 5, lastterm: 1]
raft2024/03/08 13:35:12 INFO: raft.node: 2 elected leader 3 at term 2
raft2024/03/08 13:35:12 INFO: 1 is starting a new election at term 2
raft2024/03/08 13:35:12 INFO: 1 became candidate at term 3
raft2024/03/08 13:35:12 INFO: 1 [logterm: 2, index: 8] sent MsgVote request to 2 at term 3
raft2024/03/08 13:35:12 INFO: 1 [logterm: 2, index: 8] sent MsgVote request to 3 at term 3
raft2024/03/08 13:35:12 INFO: 1 [logterm: 2, index: 8] sent MsgVote request to 4 at term 3
raft2024/03/08 13:35:12 INFO: 1 [logterm: 2, index: 8] sent MsgVote request to 5 at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 1 lost leader 3 at term 3
raft2024/03/08 13:35:12 INFO: 1 received MsgVoteResp from 1 at term 3
raft2024/03/08 13:35:12 INFO: 1 has received 1 MsgVoteResp votes and 0 vote rejections
raft2024/03/08 13:35:12 INFO: 4 [term: 2] received a MsgVote message with higher term from 1 [term: 3]
raft2024/03/08 13:35:12 INFO: 4 became follower at term 3
raft2024/03/08 13:35:12 INFO: 4 [logterm: 2, index: 6, vote: 0] cast MsgVote for 1 [logterm: 2, index: 8] at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 4 lost leader 3 at term 3
raft2024/03/08 13:35:12 INFO: 3 [term: 2] received a MsgVote message with higher term from 1 [term: 3]
raft2024/03/08 13:35:12 INFO: 3 became follower at term 3
raft2024/03/08 13:35:12 INFO: 3 [logterm: 2, index: 8, vote: 0] cast MsgVote for 1 [logterm: 2, index: 8] at term 3
raft2024/03/08 13:35:12 INFO: 1 received MsgVoteResp from 4 at term 3
raft2024/03/08 13:35:12 INFO: 1 has received 2 MsgVoteResp votes and 0 vote rejections
raft2024/03/08 13:35:12 INFO: 2 [term: 2] received a MsgVote message with higher term from 1 [term: 3]
raft2024/03/08 13:35:12 INFO: 2 became follower at term 3
raft2024/03/08 13:35:12 INFO: 2 [logterm: 1, index: 5, vote: 0] cast MsgVote for 1 [logterm: 2, index: 8] at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 2 lost leader 3 at term 3
raft2024/03/08 13:35:12 INFO: 1 received MsgVoteResp from 3 at term 3
raft2024/03/08 13:35:12 INFO: 1 has received 3 MsgVoteResp votes and 0 vote rejections
raft2024/03/08 13:35:12 INFO: 1 became leader at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 1 elected leader 1 at term 3
raft2024/03/08 13:35:12 INFO: 5 [term: 2] received a MsgVote message with higher term from 1 [term: 3]
raft2024/03/08 13:35:12 INFO: 5 became follower at term 3
raft2024/03/08 13:35:12 INFO: 5 [logterm: 2, index: 6, vote: 0] cast MsgVote for 1 [logterm: 2, index: 8] at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 5 lost leader 3 at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 4 elected leader 1 at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 2 elected leader 1 at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 3 elected leader 1 at term 3
raft2024/03/08 13:35:12 INFO: raft.node: 5 elected leader 1 at term 3
    node_test.go:83: commits failed to converge!
--- FAIL: TestRestart (5.13s)

Looking at the test implementation, I suspect the problem is in the waitLeader function. It waits for a wrong signal, and erroneously reports some non-leader node to be the leader. The n.Status().SoftState.Lead that it collects from all the nodes only represents votes, but the actual elected leader can end up different.

A fix would be to wait for a node that is in StateLeader.

pav-kv commented 6 months ago

The n.Status().SoftState.Lead that it collects from all the nodes only represents votes

This statement needs checking. After a quick scan of the code, I see that it might be false. The lead field is updated only when getting messages that only a leader can send.

pav-kv commented 6 months ago

But the problem is in these lines:

raft2024/03/08 13:35:12 INFO: 3 became leader at term 2
...
2024/03/08 13:35:12 raft.2: stop
...
2024/03/08 13:35:12 raft.3: stop

Node 3 is the leader, but it was stopped. However, the test code does not intend to stop the leader, which means we might have detected the leader incorrectly.

MrDXY commented 6 months ago

Hi @pav-kv , based on the log you provided, I think I may have figured out why it will stop a leader.

  1. raft.2 and raft.3 has been stopped in the log.
    • That means the value that waitLeader function returns is 0. (It's not expected, but possible, I'll explain why)
    • Meaning node[0+1] and node[0+2](The Leader) will be stoped.
  2. In the log you provided, raft.node3 is elected as a Leader, and in the very next few lines, raft.node5 confirms it has a new Leader
    • That means, both raft.node3 and raft.node5 hves updated their raft.lead to 3.
  3. In the function, waitLeader,
    
    for i, n := range ns {
    lead := n.Status().SoftState.Lead
    if lead != 0 {
        l[lead] = struct{}{}
        if n.id == lead {
            lindex = i
        }
    }
    }
    if len(l) == 1 {
    return lindex
    }

- Let's say, i is 4 now, and the leader of node[4] has changed from 0 to 3 in step2
- So `l[lead] = struct{}{}` will be executed.  
- But `n.id == lead` is false, lindex still equals 0.
- The iteration ends, and `len(l)` now equals 1.
- This function will return lindex, which is 0.

Another thing is, it is also possible that `n.Status()` could be blocked if the event loop in the `run` function does not handle the status channel in time, potentially allowing for the 4th iteration takes more time to execute, and give time for the Leader update. Meaning 3th iteration still don't have a Leader, while 4th iteration elected one.(Not sure about this)
MrDXY commented 6 months ago

Hi @pav-kv , I would like to submit a pull request to address this issue. My proposed solution involves using -1 as the initial value of 'lindex' to prevent confusion between 'not found' and 'first element'. The code would look something like this. Would this solution be acceptable to you?

func waitLeader(ns []*node) int {
    var l map[uint64]struct{}   
    // Use -1 instead of 0 to avoid confusion between 'not found' and 'first element'.
    var lindex = -1

    for {
        l = make(map[uint64]struct{})

        for i, n := range ns {
            lead := n.Status().SoftState.Lead
            if lead != 0 {
                // Set l[lead] if lead != 0 to detect a two-leader case. Since a follower can also have an old leader.
                l[lead] = struct{}{}
                // Update lindex when a node itself becomes a leader to avoid setting a follower's old leader as lindex.
                if n.id == lead {
                    lindex = i
                }
            }
        }

        if len(l) == 1 && lindex != -1 {
            return lindex
        }
    }
}
MrDXY commented 5 months ago

Hi @pav-kv , Whenever you get a chance, would you mind taking a look at this proposal? No pressure at all, I'd just really appreciate your input and ideas. Thanks!