Hoverbear / old-raft-rs

[Incomplete] A Raft implementation in Rust
https://hoverbear.github.io/raft-rs/raft/
MIT License
266 stars 41 forks source link

Log missing `last_applied_index()` #100

Open Hoverbear opened 9 years ago

Hoverbear commented 9 years ago

@tschottdorf will likely expand on this as he brought it up on IRC.

The log currently does not store applied_index and last_commited_index and this will likely be necessary if, say, a machine is rebooted and it needs to know what's been committed and applied.

We noted that both are needed because we might be committing say 100 entries but the state machine may take time to apply them.

tbg commented 9 years ago

Upon further reflection, I have to backtrack a little. I think the committed index isn't strictly needed. Say all nodes in a cluster spontaneously reboot. They'll manage to elect a leader (that only depends on last log index and last term), so once that leader manages to append a command to a majority, the commit index is reclaimed. The problem with the current code (including the PR #99) will be that in such a situation, when the server first tries to append (in absence of further information about the peer, it's wise to try to append the most recent entry as a probe), it'll (probably) use 0 as the index of the "last append", and that will be what's returned with the append's rejection, leading to a complete replay of the log. The Raft thesis talks about this in some detail: finding the last position at which (index, term) match is what's necessary. That means doing a binary search or something like that. Not hard to implement (just return the index at which the leader is trying to append divided by two as a rejection hint). Plus, any follower who didn't crash still has the in-memory state, so this is really only a problem when multiple nodes zap out at the same time. So I think we should instead make sure this works, add a few tests and then call it a day. It's good to keep the interface small and using the log is expensive, so it's good to avoid it.

danburkert commented 9 years ago

Not storing the last_applied and the last_committed is intentional. On startup we should be applying all log entries to the state machine, starting at either log index 1, or the the most recent snapshot. This allows state machine implementations to be completely volatile. We have not implemented this yet, since we don't have a non-volatile log implementation anyways.

To facilitate efficient state machines which handle their own durability, we may want to add a method to StateMachine that Consensus could use to ask the state machine its latest applied index (although right now we don't expose log indices at all to StateMachine).

danburkert commented 9 years ago

it'll (probably) use 0 as the index of the "last append"

That would be a bug, the master should use the index of the last commit in its log, regardless of whether it committed it or another master did.

danburkert commented 9 years ago

I was a bit vague in my first comment regarding applying logs at startup. You can't actually do that-- Consensus must wait for the commit index to roll forward (which it will after the first commit).

tbg commented 9 years ago

As for the applied index, that's a bit more complicated. It's really only the StateMachine itself that can tell whether it's applied something or not (it could apply and then crash, so that we'd re-apply the thing again). Of course the information is still valuable. Without it, we may not know what's already been applied. The current code with an actual durable log would simply re-apply everything from zero when the in-memory state is lost. If we persisted the applied_index (it should really be called definitely_applied_index) after apply() returns, we know that we'll erroneously re-apply at most one entry if something goes wrong (i.e. we apply, the state machine maybe applies it, but we never get to update our entry). That knowledge can be used by implementations of StateMachine to avoid extra work: They just need to cache the last command's ID (atomically with its execution). We're currently not passing the log index to apply(), but if we add hybrid logical clock timestamps (which I think we should) then those would be the logical candidate for this task since they're interesting to StateMachine in their own right anyways, and other than that they're just like a log index.

tbg commented 9 years ago

To facilitate efficient state machines which handle their own durability, we may want to add a method to StateMachine that Consensus could use to ask the state machine its latest applied index

that's also a valid way to go about it. I haven't really considered wanting to replay the whole log (or snapshotting) because in my mind, the state machine is usually a database. Querying the state machine for its last applied index does the same thing, though, and doesn't bother the log, so I'm for it.

danburkert commented 9 years ago

Chapter 5 of Diego's thesis has a very thorough discussion of snapshotting techniques, I think we should probably stick close the suggestions made there.

tbg commented 9 years ago

it'll (probably) use 0 as the index of the "last append"

That would be a bug, the master should use the index of the last commit in its log, regardless of whether it committed it or another master did. I was a bit vague in my first comment regarding applying logs at startup. You can't actually do that-- Consensus must wait for the commit index to roll forward (which it will after the first commit).

Not sure I understand. When the leader after reboot tries to commit its first entry (thus re-discovering the commit index) it needs to figure out the position in the peer's log at which they match the leader. At this point it doesn't know anything but its own log's length, so it has to do a (binary or dumb one-element-from-back) search over (a part of) the peer's log, for each peer.

tbg commented 9 years ago

Recap of the above is that adding last_applied_index() to the state machine interface is a good idea. Implementations which don't persist state just respond with zero (so that they get a snapshot/log playback) but others may want to pick up where they left off and can return that index. The commit index isn't needed. I'm going to rename the issue accordingly.

edit: since I can't @Hoverbear would you do us the honor?