eBay / NuRaft

C++ implementation of Raft core logic as a replication library
Apache License 2.0
997 stars 236 forks source link

About decay_target_priority #364

Open xylophonee opened 2 years ago

xylophonee commented 2 years ago

Hi, I have only started one node, is it necessary to 'decay_target_priority' when 'election_timeout'?

2022-07-10T20:58:22.064_541+08:00 [9675] [WARN] Election timeout, initiate leader election [handle_timeout.cxx:286, handle_election_timeout()] 2022-07-10T20:58:22.064_586+08:00 [9675] [INFO] [PRIORITY] decay, target 100 -> 80, mine 100 [handle_priority.cxx:212, decay_target_priority()]

Another question:'decay_target_priority' is required if there is no current Leader ?

if (!hb_alive_) { // Not the first election timeout, decay the target priority. decay_target_priority(); } I configured three nodes and killed the Leader. when 'Election timeout' why is 'decay_target_priority' not triggered?

2022-07-10T21:01:24.527_332+08:00 [9675] [WARN] Election timeout, initiate leader election [handle_timeout.cxx:286, handle_election_timeout()] 2022-07-10T21:01:24.527_351+08:00 [9675] [INFO] [ELECTION TIMEOUT] current role: follower, log last term 1, state term 1, target p 100, my p 100, hb alive, pre-vote NOT done [handle_timeout.cxx:304, handle_election_timeout()] 2022-07-10T21:01:24.527_361+08:00 [9675] [INFO] reset RPC client for peer 2 [handle_vote.cxx:79, request_prevote()]

greensky00 commented 2 years ago

@xylophonee You can refer to priority-based leader election.

The first-round vote request should be based on the original target priority, and the first decay happens after the first-round vote (when it didn't succeed) and before the second-round vote. Hence you don't see it if hb_alive == true as hb_alive becomes false during the first-round vote.

Regarding the first question, it is a universal logic based on the target priority and hb_alive flag, regardless of the number of nodes in a cluster. So there is no reason not to call decay_target_priority for a single node cluster as it makes the code verbose. However, it doesn't have any impact if the cluster has only one server, it always proceeds with the vote ignoring the priority.

xylophonee commented 2 years ago

@greensky00 Thanks, I understand, I would like to ask how you solve the problem of Leader running slow? The current consistency protocol is addressing the case of node death.

greensky00 commented 2 years ago

Not sure what "running slow" exactly means. If the leader is slower than other members, then doing a leader election and changing the leadership should be desired. If the leader is slow as well as all the other members, that means the incoming traffic is too heavy. In that case, a leader election doesn't help so you need to throttle down the traffic.