etcd-io / raft

Raft library for maintaining a replicated state machine
Apache License 2.0
630 stars 160 forks source link

Test: Fix flaky leader index in `waitLeader` function #188

Closed MrDXY closed 5 months ago

MrDXY commented 5 months ago

Summary

The waitLeader function currently has an issue where it returns the wrong leader index in certain scenarios. When iterating through nodes to find the leader, if the leader changes during the iteration, the function incorrectly returns the default value instead of the actual leader index.

Example

In the function, waitLeader: https://github.com/etcd-io/raft/blob/e1bfcf718c72ad64e12fcfd867bb012f19a14694/rafttest/node_test.go#L130-L151

Let's say,

  1. i is 3 now, and we still don't have a leader elected.
  2. Then node[1] is elected as a leader,
  3. i is 4 now, and the leader of node[4] has changed from 0 to 1 in step2. So l[lead] = struct{}{} will be executed. But n.id == lead is false, lindex still equals 0.
  4. The iteration ends, and len(l) now equals 1. This function will return lindex, which is 0. But 0 is not the real leader index, but the default value.

Proposed Changes

To address this issue, the proposed solution involves initializing the lindex variable to -1 instead of 0. This change ensures that lindex holds a distinct value indicating 'not found' rather than potentially conflating it with the index of the first element.

Changes Made

Impact

This change ensures that the waitLeader function correctly returns the leader index, avoiding confusion between 'not found' and the index of the first element.

Fixes #181