Hoverbear / old-raft-rs

[Incomplete] A Raft implementation in Rust
https://hoverbear.github.io/raft-rs/raft/
MIT License
266 stars 41 forks source link

Handle pending proposals at leader after transition to follower #115

Open tbg opened 8 years ago

tbg commented 8 years ago

When a client proposes a command (at the leader) but the leader steps down before the command commits, the client is left waiting. Simply telling the waiting client that its proposed command failed isn't a great option: The command may still commit (if it has been replicated to another node, it's likely that that node has obtained leadership).

One convenient way out is the following: We (already) internally store not only the pending command, but also the log index i of that command. Once a command commits at i, it's guaranteed that either a) the command committed is the one we were waiting for, or b) the command will not commit at any point in the future. So followers can abort any pending commands when their respective index is exceeded.

Currently the pending proposals are a property of LeaderState, so that would have to change. Likely makes sense to move them into Consensus proper.

danburkert commented 8 years ago

This gets at the bigger issue of providing exactly-once semantics for clients. It's necessary to have exactly-once client operations for the consensus implementation to provide linearizability (in the case of Raft it's necessary and sufficient, since vanilla Raft already guarantees serializability). Given that the case you outline is only one of many where Raft fails at exactly-once, I'm not sure it's worth special-casing, unless this specific scenario leads to bigger issues? Providing robust exactly-once semantics in any failure scenario would be really nice, though (probably by using something like RIFL).

danburkert commented 8 years ago

Actually exactly-once might not be necessary for linearizability, it may be a bit stronger than required? CC #106

tbg commented 8 years ago

It's the "at least-once portion" of the "exactly" once. Agreed that with proper replay protection, it's kosher to simply cancel the command when leadership is lost, so that the client can retry. That's likely a double proposal, but deduplication would catch it (if we have it). With the simple client model we have right now, replay protection is relatively easy (one cached response with leader-determined timestamp per client ID which is cleared on each new proposition (and after a configurable timeout as computed by leader timestamp attached to heartbeats to account for absent clients).

tbg commented 8 years ago

Actually exactly-once might not be necessary for linearizability, it may be a bit stronger than required? CC #106

The "definition" of linearizability struggles with situations like these, but for all intents and purposes that's what clients will want and should get. It gets interesting as the client semantics become more abstract, but it's always been a pain dealing with this in the client when the right primitives weren't there.

danburkert commented 8 years ago

I don't think it's possible to provide 'at-least once' behavior in a distributed system without allowing retries, but correct me if I'm wrong. The original proposal in this thread handles the case of a master being replaced by another master with a later term during a proposal replication, but it doesn't handle the case of the master being partitioned from the client during proposal replication.

tbg commented 8 years ago

Sorry, was being imprecise (if the message to the leader gets lost, of course there's no response). I agree that it seems better to just have proper replay protection and to cancel outstanding proposals when the leader steps down (redirecting to the new leader).