Open Fullstop000 opened 5 years ago
Hi @Fullstop000 , we don't think dynamic change of quorum size is safe, although we haven't proved it rigorously. In eBay use cases, quorum size basically remains unchanged, and we adjust it only for the manual recovery of fatal case.
If you want to change it on-the-fly, how about increase one of it (Qc(3)->Qc(5)
) first, wait for at least one commit, and then decrease the other one (Qe(3)->Qe(1)
) next? Still I'm not sure whether it is safe. @genezhang Please let me know your thoughts.
Thanks.
@greensky00 Thank you for the reply. The way you provide should be safe as the Leader Completeness is guaranteed in election after the entry is committed under Qc(5)
. And it seems only when decreasing Qe
might cause some unsafe concerns.
I don't think dynamic change of quorum size is safe, it's the same thing as membership changes, there is a way to make it right, but the easiest way is still add/remove one at a time. manually change could be used to recover a bad cluster, but, yes, more diagnostic tools need to be added to support real production issues, such as a log entry may crash all nodes when state machine is trying to apply the log entry, then, we should have a tool to remove that log entry from log store and restart the nodes, etc.
Basically, it's easy to understand that the flexible quorum is safe if it's a static config. But is the
Leader Completeness
guaranteed whenQc
andQe
are dynamically adjustable?For example, I have a 5 nodes cluster with
Qc(3)
andQe(3)
(the default algorithm) and then change the config toQc(5)
andQe(1)
. How can the cluster always elect a valid leader as there are potential 2 nodes with incomplete raft logs but a leader can be established by its own vote?