Hoverbear / old-raft-rs

[Incomplete] A Raft implementation in Rust
https://hoverbear.github.io/raft-rs/raft/
MIT License
266 stars 41 forks source link

Design Notes #24

Closed danburkert closed 9 years ago

danburkert commented 9 years ago

This is an overview of the architecture of the raft library. Familiarity with the Raft Consensus Algorithm is assumed.

From a user perspective, there are three fundamental components of the raft library: the StateMachine, the Server, and the Client. Subsequent sections will detail these components individually, and their interactions.

StateMachine

StateMachine is a Rust trait, which users of the raft library must provide. A StateMachine is a single instance of a distributed application. It is the raft libraries responsibility to take commands from the Client and apply them to each StateMachine instance in a globally consistent order.

The StateMachine is interface is intentionally generic so that any distributed application needing consistent state can be built on it. For instance, a distributed hash table application could implement StateMachine, with commands corresponding to insert, and remove. The raft library would guarantee that the same order of insert and remove commands would be seen by all replicas.

Server

Server is a Rust type which is responsible for coordinating with other remote Server instances, responding to commands from the Client, and applying commands to a local StateMachine replica. A Server may be a Leader, Follower, or Candidate at any given time as described by the Raft Consensus Algorithm.

Internals

In order to implement the Raft Protocol, the Server must respond to events (messages from other Server or Client instances, as well as timeouts) in highly context-dependent ways. We found while trying to implement the Server that mixing Raft Protocol logic with network logic led to incomprehensible and unmaintainable code. Accordingly, we decomposed the problem into an inner Replica type which is responsible for the Raft Protocol logic, which left the Server only responsible for receiving and dispatching events.

Replica Implementation

The Replica is a state-machine (not to be confused with the StateMachine trait) which implements the logic of the Raft Protocol. A Replica receives events from the local Server. The set of possible events is specified by the Raft Protocol:

Event = AppendEntriesRequest | AppendEntriesResponse
      | RequestVoteRequest | RequestVoteResponse
      | ElectionTimeout | HeartbeatTimeout
      | ClientCommand

In response to receiving an event, the Replica may mutate its own state, apply a command to the local StateMachine, or return an event to be sent to one or more remote Server or Client instances.

Server Implementation

The Server is responsible for receiving events from remote Server or Client instances, as well as setting election and heartbeat timeouts. When an event is received, it is applied to the local Replica. The Replica may optionally return a new event which must be dispatched to either the Server or Client which sent the original event, or to all Server instances.

Because messages are passed asyncronously between Server instances, a Server could get into a situation where multiple events are ready to be dispatched to a single remote Server. In this situation, the Server will replace the existing event with the new event, except in one special circumstance: if the new and existing messages are both AppendEntryRequests with the same term, then the new message will be dropped.

Client

The Client allows users of the raft library to connect to remote Server instances and issue commands to be applied to the StateMachine.

danburkert commented 9 years ago

oops

Hoverbear commented 9 years ago

This looks good to me and is accurate according to our discussions.

Because messages are passed asyncronously between Server instances, a Server could get into a situation where multiple events are ready to be dispatched to a single remote Server. In this situation, the Server will replace the existing event with the new event, except in one special circumstance: if the new and existing messages are both AppendEntryRequests with the same term, then the new message will be dropped.

Yes this is correct because in general newer requests will render older ones not as meaningful.

Hoverbear commented 9 years ago

Disabling RPC as in not responding at all?

danburkert commented 9 years ago

@posix4e: the only way to ensure that a read is not stale is to 1) serve it from a master, 2) have the master delay responding to the request until after the next successful append-entries event. This (short of some optimizations which don't fundamentally change the situation) is the only way to guarantee a non-stale read, since raft allows multiple concurrent masters. This is, incidentally, what Aphyr's Call Me Maybe analysis revealed as weaknesses in some raft implementations. This raft implementation does not properly support this right now.

posix4e commented 9 years ago

That sounds great! Any plans for compaction?

Hoverbear commented 9 years ago

@posix4e With this new architecture that is definitely in the cards. We haven't implemented it yet but because of how we abstract the log it will be something consuming applications can implement.

Hoverbear commented 9 years ago

I pulled some of these comments into the source with 81c4817036014ae0b7743c6d2449e80dc830184e. @danburkert Should we close this?

danburkert commented 9 years ago

Sounds good to me.