Open keathley opened 6 years ago
I'm certainly interested in working on this, so I'll set aside some time to read the paper again and catch up on some of the other links you mentioned here. My thought would be to review how etcd or Consul do it, and borrow their approach, since it is relatively battle-tested, rather than implementing our own interpretation of it (assuming we can't find a concrete "this is exactly how you do it" process).
Most real implementations use the "pre-vote rpc" along with the AddServer
and RemoveServer
Rpcs. Both are defined in that "modifications for Raft consensus" document. Based on my research thats what etcd does as well. I'm not sure hashicorps raft has incorporated these changes at this point.
So after reading through all this I think it makes the most sense to use the explicit steps for initializing the cluster, adding peers, and removing peers. I think we can use something like:
@type peer :: module() | {module(), node()}
@spec initialize_cluster(peer()) :: :ok | {:error, term()}
@spec add_peer(peer(), peer()) :: :ok | {:error, {:redirect, peer()}} | {:error, :not_leader}
@spec remove_peer(peer(), peer()) :: :ok | {:error, {:redirect, peer()}} | {:error, :not_leader}
I think I have a pretty good handle on how all of this works based on the thesis paper so I'll try to lay that out here when I get a chance.
So my take on this is that while an explicit API is nice to have (particularly for testing), it is possible to make this automatic using node monitoring (where we effectively get the equivalent of add_peer and remove_peer events). This is how Swarm handles automatically forming a cluster, although it does support a way to blacklist/whitelist nodes so that only those you want participating in the cluster are able to.
Reading through the paper and it's section on cluster membership changes, I didn't see anything that indicated this would be a problem, but I haven't dug into the internals of this library yet, so I don't know if there is a constraint based on the implementation.
I spent some time yesterday putting together a distributed process registry based on raft
, and by far the thing that stood out to me was that initializing the cluster was a manual process, and there isn't a great place to put that in our own applications. It would be preferable if the implementation of the state machine server was working to form a cluster on it's own. This behaviour would need to be configurable, because there are obviously times where you may want to manually manage the configuration (testing again is an example), but I think for a library like this to be successful, it needs to be easy to use by default, and offer escape hatches for things which may need tweaking in some situations.
Thoughts?
I totally agree that we need to make cluster membership changes automatic. I can work on implementing the underlying functions such as initialize_cluster
, add_peer
, etc. so that we can leverage them from a higher level abstraction that can automatically call those based on node monitoring.
Hey guys, this may be a dumb question but wouldn't it be possible to just enumerate Node.list()
and as nodes join the cluster have the already existing AppendEntries
RPC get them to catch up ? I could see this being a problem if we have lots of nodes but from what I gather that's not the case with Raft clusters.
We need to support adding and removing peers to the raft cluster. While this is described briefly in section 6 of the raft paper I feel like the explanation is underspecified specifically with regards to the rejection of request vote rpcs and the explanation of "catching up" new peers before initiating the joint consensus.
I'm reading through the raft mailing list, and other resources that I'll link here in order to get a better feel for potential improvements to the solution. Right now my intuition is that we should look at using the
AddServer
andRemoveServer
RPCs from the "ongaro thesis" which I've linked below.Research / Links