NicolasT / kontiki

An implementation of the Raft consensus protocol
BSD 3-Clause "New" or "Revised" License
122 stars 15 forks source link

Sending log entries without hearbeat (was:MonadLog vs CLogEntries) #8

Open qrilka opened 10 years ago

qrilka commented 10 years ago

While thinking about a patch for #7 I've stumbled upon a problem: at the moment I don't really understand current kontiki API. I.e. I don't understand whether client code should use MonadLog or CLogEntries. The first seems to provide a way to log any Entry appearing on a node. But if CLogEntries does not seem to be uniforrm - it could be received only on Followers and for Leaders some special handling should be used. Is it a problem with my API understanding or there is some inconsistencies in this API? BTW haddocks could help here too :)

NicolasT commented 10 years ago

The MonadLog interface is used by the state machine to access the log as if it's an immutable/pure datastructure somewhere, although in real code this will need IO etc. While the state-machine performs a single transition/step, the log is actually frozen.

Similar to other Commands, modifications to the log could be the result of a state transition, hence the CLogEntries and CTruncateLog Commands.

So:

The 'udp' demo might give some more insight.

You're right about docs, I should get to #6 asap.

qrilka commented 10 years ago

Actually 'udp' demo seems to be the main source if my current undestanding of kontiki. And it is quite confusing that new values are being generated inside the Leader itself - it is quite far from Raft intended usage. And for #7 correction the problem with CLogEntries uniformity (which I describe above) compicate the situation: handleHeartbeatTimeout generates CLogEntries and could generate something like CCommitted but it is code from kontiki library and so Leader can not observe this CCommitted otherwise than to process CSend and take information about commited index out of it. And that seems to be very unintuitive control flow. The main problem is that every node (including Leader) needs both log entries and commited index for correct Raft use. Log entries are easy as Leader is receiving it (out of somewhere) but commited index is controlled by Raft implementation so Leader have to get it somehow from Raft itself.

jkozlowski commented 10 years ago

I forgot to watch this repo when I forked, so I missed this discussion.

That was my confusion as well that a client request to log an entry does not seem to be encoded in the API, so the leader's log is updated as if under his feet, implicitly by the driver that uses the model.

Also, from my current understanding, there can only be a single transaction in flight really, as there is no notion of "issue a write" and "write completed", so the driver needs to synchronously perform all commands. After reading the paper, I'm not sure if it's the Raft's limitation or the current implementation. I could see that it should be possible to implement.

So how do we proceed? I have this weekend free for some Haskell hacking, I was thinking about testing in the spirit of http://www.haskellforall.com/2013/11/test-stream-programming-using-haskells.html, but if the APIs going to change, I suppose it's a bit pointless...

NicolasT commented 10 years ago

Regarding CSetCommitIndex: this is issued upon every AppendEntriesRequest received on a follower, or on a leader whenever an AppendEntriesResponse is received (the leader can recalculate the commitindex in there, since that's where it keeps track of the log indices stored on followers). The discussion about this should go into #7 though :-)

NicolasT commented 10 years ago

Regarding the adding of log entries in the udp demo within handleHeartbeatTimeout: this is obviously on ly for demonstrational purposes, and I agree it's most likely not the right place, adding confusion. I guess this should be handled by some other 'client' thread.

NicolasT commented 10 years ago

@jkozlowski is right: new entries to be distributed in the log are added to the log on the leader outside this library (and I really want to keep itlike that). Raft is all about distrbuting a (potentially) growing log, but from the protocol perspective this log is static.

Sorry for the multiple comments, working from a tablet, not that easy...

NicolasT commented 10 years ago

Regarding multiple concurrent transactions: that's certainly possible, and nor the Raft protocol, nor the library design (if #7 is fixed) prohibit this.

Here's the trick: whenever some code appends a transaction to the log, it knows the log index (and can e.g. connect this to some client request). When the driver (in which collaborates with whichever subsystem is manipulating the log) receives a CSetCommitIndex command, it can know which transactions have been distributed correctly (i.e. all transactions with a lower log index).

It is not within the scope of this library to handle this. All Kontiki should provide is a 'model' implementation of the basic Raft consensus algorithm which is all about replication a (most likely growing over time) log/journal.

jkozlowski commented 10 years ago

Yep, that now makes sense, thanks.

NicolasT commented 10 years ago

@jkozlowski It'd be great to have some testing in line with what @Gabriel439 did there. I don't think the overall API of this library should change too much, except for a new command being added as part of #7, and even without that, the current implementation can be tested partially, even though a commit-index is not exposed (e.g. the master election part of the protocol is fairly distinct from the log).

qrilka commented 10 years ago

@NicolasT, you are right that descussions of CSetCommitIndex should go to #7 And as for your words on udp demo and handleHeartbeatTimeout this code is not in the demo itself and is the part of current kontiki. And I don't see any other way to pass new entries into kontiki so it will take them into account and generate CSend as a result. Do you see any such way?

NicolasT commented 10 years ago

I'm not sure I understand what you want to achieve. You mean you want to trigger a replication manually after extending the log, not waiting for a heartbeat timeout?

qrilka commented 10 years ago

Looking into Raft paper I see:

Once a leader has been elected, it begins servicing
client requests. Each client request contains a command
to be executed by the replicated state machines. The
leader appends the command to its log as a new entry, then
issues AppendEntries RPCs in parallel to each of the other
servers to replicate the entry. 

So client request is supposed be replicated immediately. But looking into a Go implementation at https://github.com/goraft/raft/blob/master/peer.go#L124 they also send entries only on heartbeat. But its better to look into original implementation. And there I see append function - https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1549 and that function triggers stateChanged which will result in entries being sent (almost) immediately in peerThreadMain https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1397

qrilka commented 10 years ago

BTW if looks like this details are not actually about "MonadLog vs CLogEntries" not sure if we should split it into another issue or issue should be renamed. Looks like the second option is better as github does not allow to move comments between issues.

NicolasT commented 10 years ago

Aha! I think I know what's going on.

The Raft paper has gone through a couple of revisions (@ongardie might know more). The Kontiki code is based on an early draft version of the paper, and so is @goraft. In that version of the paper, the leader didn't provide write-access to the log, it simply replicated 'some log'. Whilst the newer approach might be easier to explain, I tend to like the old one more (simplifies things a lot!) and think we should stick to that approach (unless there's an important reason not to).

qrilka commented 10 years ago

Maybe you have a link to that draft? (Current paper on https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf also has status of draft) Commit 51773c3af76c503d8a336cb03aa24ee1b9d7e4f3 in logcabin (which includes append) is from 2012-09-10 i.e. 8 months before kontiki was even started. Postponing sending to hearbeat could improve throughput but would affect latency negatively. I don't see much simpification in prohibiting single entry - there should be no problem in appending just a single entry and invoking almost the same code as in handleHeartbeatTimeout (or should it be exactly the same?). I think that limiting log entries sending only to hearbeat events is to restrictive for a library dealing with Raft

NicolasT commented 10 years ago

Agree, adding code to inject/trigger a heartbeat-timeout (or something along those lines) would make sense. But it's strictly speaking not required ;-)

qrilka commented 10 years ago

A the moment I'm not sure if kontiki actually need separate function to implement it as it seems that injecting heartbeats could be enough. I was thinking on implement it outside of kontiki (around actual network protocol the way you have it in udp.hs demo) to see how it will look like. But looks like I have just forgot about it :) Hope to do it when I'll have some time.