Closed jschoch closed 11 years ago
All nodes in a raft consensus group are equal. No roles are hardcoded. If a leader fails another leader will be elected by the remaining members of the group. If all nodes are up to date, election is essentially random and any of the alive nodes can become leader.
There is no way to designate a specific leader, and you really wouldn't need/want to in most situations. The whole point of leader election is to allow for HA and consistency. If you tie the leaders to given nodes, when those nodes go down your system is down.
well i guess instead of
get_leader(peer1)
i'd like
get_leader(role,peer)
this way you can have different severs be leaders for different roles, the failover and HA would work exactly the same, but a single failure would only effect a single role and require an election for that role, all other roles woudl be unaffected if the server that failed was not the leader for that role at that time.
@jschoch Unfortunately that's not how the algorithm works. However, you can certainly start up multiple consensus groups on the same nodes however. Each peer is only a few erlang processes so you could theoretically run thousands of them. You'd likely hit memory/bandwidth/disk limits well before exceeding the number of processes Erlang allows you to run.
Once the peers are started you can then do the following:
get_leader(group1_peer1).
get_leader(group2_peer1).
This has the additional benefit of allowing each of those groups to use different state machines and stores data in different log files.
@andrewjstone Please elaborate on this in the README — although one can figure out to give disjoint groups a spin, correctness of such actions isn't obvious and may be implementation dependent.
Furthermore it may be very handy if an arbitrary term would be allowed as a peer name (e.g. {group, I, peer, J}
where I
and J
are integers), this capability would liberate client's code from tons of integer_to_list
and list_to_atom
. Mentioned behaviour implementable with a minimal effort using gproc
in non-distributed mode and {via, Module, ViaName}
feature of gen_fsm
.
Hi @sumerman,
I do plan at some point to write about, and work on a consensus group manager that will allow handling multiple disjoint consensus groups. I don't ever expect them to coordinate among themselves however. They would be used instead for naturally partitioned data. For instance each consensus group would handle managing data for one specific business customer. You could do aggregate calculations on the groups together, but the aggregates wouldn't be consistent.
I'm actually not even sure this manager should be part of rafter and not just part of the application using rafter. This is sort of beyond the scope of implementing the raft algorithm, so I've punted on it until the rest of the code is production ready, which will probably be a while considering it still needs compaction, anti-entropy, better log performance etc....
As far as using arbitrary terms, I agree that this would indeed limit type conversions when implementing a non-erlang frontend. However, the code is simpler, without gproc. That said, I'm open to changing it. I'd probably ignore the I and J since there is only one group as described avove and just use a term as the name directly. If you want to send a patch I'll take a look.
I'm thinking having multiple leaders in a cluster for different roles would be a good way to balance load around a cluster. Any thoughts on how to associate a role to a cluster so you could do something like
forgive my syntax i'm learning elixir and not quite up to speed with erlang roles = [ logger: [server1,server2,server3], queue:[server3,server2,server1], cron: [server2, server1,server3] ]
is this possible, or are the peer1..5 atom's hard coded and assumed to be in a singular cluster