Open myrrc opened 1 year ago
Both yield_leadership
and request_leadership
cannot enforce adding/removing member. This is because there is no guarantee that membership change will eventually be succeeded and committed. For example, 1 is leader, it gets the adding server requests, but fails to replicate the message due to network partition.
Also, membership change should be done one at a time. Next membership change should be done after making sure that the previous change is committed. There is a known problem that multiple membership change at once may result in incorrect quorum and data inconsistency. The original paper tried to resolve it by "joint consensus", but NuRaft does not implement it and instead enforces one member change at a time.
There was a similar thread: https://github.com/eBay/NuRaft/issues/177
Situation:
yield_leadership
(if a request got on 1) orrequest_leadership
(if we're on 2 or 3).request_leadership
. The request gets to 1. 1 pauses writesQuite a synthetic example, but that's what we encountered in
So, a fix option is to wait for new config to get committed and to execute new commands only after that, but I wonder whether there's an option to solve this at library level.
I tried changing https://github.com/eBay/NuRaft/blob/188947bcc73ce38ab1c3cf9d01015ca8a29decd9/src/raft_server.cxx#L1244 so that an option toggled would make leader commit all appended entries before pausing writes, no luck -- seems there are way to many invariants that get broken