Closed danburkert closed 9 years ago
oops
This looks good to me and is accurate according to our discussions.
Because messages are passed asyncronously between Server instances, a Server could get into a situation where multiple events are ready to be dispatched to a single remote Server. In this situation, the Server will replace the existing event with the new event, except in one special circumstance: if the new and existing messages are both AppendEntryRequests with the same term, then the new message will be dropped.
Yes this is correct because in general newer requests will render older ones not as meaningful.
Disabling RPC as in not responding at all?
@posix4e: the only way to ensure that a read is not stale is to 1) serve it from a master, 2) have the master delay responding to the request until after the next successful append-entries event. This (short of some optimizations which don't fundamentally change the situation) is the only way to guarantee a non-stale read, since raft allows multiple concurrent masters. This is, incidentally, what Aphyr's Call Me Maybe analysis revealed as weaknesses in some raft implementations. This raft implementation does not properly support this right now.
That sounds great! Any plans for compaction?
@posix4e With this new architecture that is definitely in the cards. We haven't implemented it yet but because of how we abstract the log it will be something consuming applications can implement.
I pulled some of these comments into the source with 81c4817036014ae0b7743c6d2449e80dc830184e. @danburkert Should we close this?
Sounds good to me.
This is an overview of the architecture of the
raft
library. Familiarity with the Raft Consensus Algorithm is assumed.From a user perspective, there are three fundamental components of the
raft
library: theStateMachine
, theServer
, and theClient
. Subsequent sections will detail these components individually, and their interactions.StateMachine
StateMachine
is a Rust trait, which users of theraft
library must provide. AStateMachine
is a single instance of a distributed application. It is theraft
libraries responsibility to take commands from theClient
and apply them to eachStateMachine
instance in a globally consistent order.The
StateMachine
is interface is intentionally generic so that any distributed application needing consistent state can be built on it. For instance, a distributed hash table application could implementStateMachine
, with commands corresponding toinsert
, andremove
. Theraft
library would guarantee that the same order ofinsert
andremove
commands would be seen by all replicas.Server
Server
is a Rust type which is responsible for coordinating with other remoteServer
instances, responding to commands from theClient
, and applying commands to a localStateMachine
replica. AServer
may be aLeader
,Follower
, orCandidate
at any given time as described by the Raft Consensus Algorithm.Internals
In order to implement the Raft Protocol, the
Server
must respond to events (messages from otherServer
orClient
instances, as well as timeouts) in highly context-dependent ways. We found while trying to implement theServer
that mixing Raft Protocol logic with network logic led to incomprehensible and unmaintainable code. Accordingly, we decomposed the problem into an innerReplica
type which is responsible for the Raft Protocol logic, which left theServer
only responsible for receiving and dispatching events.Replica
ImplementationThe
Replica
is a state-machine (not to be confused with theStateMachine
trait) which implements the logic of the Raft Protocol. AReplica
receives events from the localServer
. The set of possible events is specified by the Raft Protocol:In response to receiving an event, the
Replica
may mutate its own state, apply a command to the localStateMachine
, or return an event to be sent to one or more remoteServer
orClient
instances.Server
ImplementationThe
Server
is responsible for receiving events from remoteServer
orClient
instances, as well as setting election and heartbeat timeouts. When an event is received, it is applied to the localReplica
. TheReplica
may optionally return a new event which must be dispatched to either theServer
orClient
which sent the original event, or to allServer
instances.Because messages are passed asyncronously between
Server
instances, aServer
could get into a situation where multiple events are ready to be dispatched to a single remoteServer
. In this situation, theServer
will replace the existing event with the new event, except in one special circumstance: if the new and existing messages are bothAppendEntryRequest
s with the sameterm
, then the new message will be dropped.Client
The
Client
allows users of theraft
library to connect to remoteServer
instances and issue commands to be applied to theStateMachine
.