Open zhangh43 opened 2 years ago
Also note that C still treat A as the leader of ng0 and reject B as the new leader. How can A give up the leadership by itself?
从以下braft文档中,可以看到braft特意处理了这种节点再次上线,打断复制组的问题。但是我上面的case中,Leader节点A还是接受了重启节点B的prevote和vote,并将自己stepdown。这个问题和lease有关系吗? 我节点A的electiontimeout是1000ms
Symmetric network partitioning
原始的RAFT论文中对于对称网络划分的处理是,一个节点再次上线之后,Leader接收到高于currentTerm的RequestVote请求就进行StepDown。这样即使这个节点已经通过RemovePeer删除了,依然会打断当前的Lease,导致复制组不可用。对于这种case可以做些特殊的处理:Leader不接收RequestVote请求,具体情况如下:
对于属于PeerSet中的节点,Leader会在重试的AppendEntries中因为遇到更高的term而StepDown
对于不属于PeerSet中的节点,Leader永远忽略
这儿是个bug。现在的代码里,pre vote 的时候会使用本地term+1作为term,这就导致A节点term==pre vote term,这个case在braft里是可以过prevote的,实际上应该拒绝掉。
有计划下个版本fix吗? 比如在leader check prevote的时候加一个condition, 如果leader lease还有效就reject?
这儿是个bug。现在的代码里,pre vote 的时候会使用本地term+1作为term,这就导致A节点term==pre vote term,这个case在braft里是可以过prevote的,实际上应该拒绝掉。
A 节点作为 leader 是有 lease 的,我觉得关键点是为什么 A 节点没有 reject_by_lease,我看代码每个 node 是根据 follower_lease 来 reject prevote 的,而 follower_lease 是在 handle_append_entries_request
时 renew 的,那 leader 节点的 follower_lease 是怎么更新的呢?leader 节点也会给自己发 append_entries 吗?
另外问一下B节点在重启之后什么情况下会做Prevote,因为Issue问题是random的,大多数B节点重启会直接加入raft group,而不做Prevote。
这儿是个bug。现在的代码里,pre vote 的时候会使用本地term+1作为term,这就导致A节点term==pre vote term,这个case在braft里是可以过prevote的,实际上应该拒绝掉。
A 节点作为 leader 是有 lease 的,我觉得关键点是为什么 A 节点没有 reject_by_lease,我看代码每个 node 是根据 follower_lease 来 reject prevote 的,而 follower_lease 是在
handle_append_entries_request
时 renew 的,那 leader 节点的 follower_lease 是怎么更新的呢?leader 节点也会给自己发 append_entries 吗?
主上不存在follower lease。braft允许follower抢主,抢到主的票之后,其他follower是可以打断现有的follower lease的
有计划下个版本fix吗? 比如在leader check prevote的时候加一个condition, 如果leader lease还有效就reject?
我来fix一下
这个可能和时机有关系,leader给B发的心跳先到达就不会发起prevote
I have a raft group ng0 consists of three nodes(A(localhost:8001), B(localhost:8011), C(localhost:8021)), timeline is as follows: T1: A is elected as the leader of group ng0. term 2. T2: Restart B. T3: B start prevote for ng0, with log message
node ng0:127.0.0.1:8011 term 2 start pre_vote
T4: B got prevote grant ack from A with log messagenode ng0:127.0.0.1:8011:1 received PreVoteResponse from 127.0.0.1:8001:0 term 2 granted 1 rejected_by_lease 0 disrupted 1
T5: B start to vote. T6: B got vote grant ack from A.node ng0:127.0.0.1:8011:1 received RequestVoteResponse from 127.0.0.1:8001:0 term 3 granted 1 rejected_by_lease 0 disrupted 1
T7: B got vote reject from C.node ng0:127.0.0.1:8011:1 received RequestVoteResponse from 127.0.0.1:8021:0 term 2 granted 0 rejected_by_lease 1 disrupted 0
T8: B become the leader of ng0 with term 3.It's quite strange that old leader A will accept the prevote and vote from reboot node B. This only happens occasionally, but I want to know it's an expect behavior or a random bug in braft.