canonical / dqlite

Embeddable, replicated and fault-tolerant SQL engine.
https://dqlite.io
Other
3.8k stars 215 forks source link

[question] Assigning from VOTER to SPARE and back to VOTER (and other scenarios) #576

Closed mdorier closed 5 months ago

mdorier commented 11 months ago

I am trying various scenarios to better understand what is the expected usage of some functions. The result in some of these scenarios don't match my expectations, so I open this thread in hope to understand where my assumptions are wrong.

Scenario 1

Ifraft_assign is local to the leader and does not communicate anything to other processes, I would expect that the leader will simply stop contacting process 3, who will not receive heartbeats and will eventually try to elect itself but fail because it will never get a majority. But when it's reassigned as a voter, I would expect it to restart getting heartbeats from the leader and to catch up, which doesn't seem to be what happens.

Alternatively, I could also expect raft_assign to commit a configuration change in all the processes including process 3, and process 3 should know it's not supposed to expect heartbeats anymore?

Scenario 2

This scenario is similar but instead of calling raft_assign to assign process 3 the role of SPARE then back to VOTER, I call raft_remove to remove process 3, and eventually call raft_add again followed by raft_assign to make it a voter. Note that I don't shutdown the process. Process 3 is still running. The result is the same as above: when process 3 is back to being a voter, it does not catch up and does not know who the leader is.

I will run with more tracing to see what's happening in particular in process 3, but in the meantime, do my scenarios make sense?

Note that I am using my own implementation of a raft_io backend, which I extensively tested (the scenarios above are testing some edge-cases). In particular I can spin up a new process, call raft_add followed by raft_assign to make it visible to the leader and to assign it the role of voter, and the new process does catch up on missing entries. The problem happens when I have an existing process running and I either assign it as spare then back to voter, or remove it then re-add it to the cluster.

Thanks!

MathieuBordere commented 11 months ago

Scenario 1


I'd expect process 3 to know who the leader is. And that its state-machine will catch up with the leader's state-machine. Would be interested in more tracing information to see what's going on.

Scenario 2


I would also expect process 3 to catch up and to know who the leader is.

mdorier commented 11 months ago

Ok I added tracing and noticed my code had mistakes: I had started with scenario 2 but I had forgotten to call raft_assign after raft_add to make the process a voter again, which explains why it wasn't catching up. When I had moved to scenario 1 (assign to spare then back to voter), I had another bug causing the second raft_assign not to be called (bad luck). Now that it's fixed, I don't see any problem anymore.

One thing I noticed with tracing enabled though is that when assigning a process as spare, the process is not notified of it (the new configuration is sent to the remaining voters/standby), so the process that is now a spare starts election rounds, which is useless and consumes cycles. Is there any way to prevent it from doing that? (maybe assigning it to standby then to spare)

MathieuBordere commented 11 months ago

... One thing I noticed with tracing enabled though is that when assigning a process as spare, the process is not notified of it (the new configuration is sent to the remaining voters/standby), so the process that is now a spare starts election rounds, which is useless and consumes cycles. Is there any way to prevent it from doing that? (maybe assigning it to standby then to spare)

Hmm that's interesting, let me think about that.

cole-miller commented 11 months ago

One thing I noticed with tracing enabled though is that when assigning a process as spare, the process is not notified of it (the new configuration is sent to the remaining voters/standby), so the process that is now a spare starts election rounds, which is useless and consumes cycles. Is there any way to prevent it from doing that? (maybe assigning it to standby then to spare)

This is a good point! Indeed, assigning the standby role first should do the trick, but I'm not opposed to implementing a fix such that assigning directly to spare just works. We would have to have a special case on the leader that replicates the new configuration to any nodes that have been demoted to spare. That might end up having some tricky edge cases, especially around retrying, but it doesn't seem totally impractical.