Open qrilka opened 10 years ago
The MonadLog
interface is used by the state machine to access the log as if it's an immutable/pure datastructure somewhere, although in real code this will need IO etc. While the state-machine performs a single transition/step, the log is actually frozen.
Similar to other Command
s, modifications to the log could be the result of a state transition, hence the CLogEntries
and CTruncateLog
Command
s.
So:
MonadLog
is read-only/'pure' access to the log from within the state machineCLogEntries
and CTruncateLog
are modifications to the log as a result of a state transitionMonadLog
actions), but this is not (and should not be) handled by Kontiki.The 'udp' demo might give some more insight.
You're right about docs, I should get to #6 asap.
Actually 'udp' demo seems to be the main source if my current undestanding of kontiki.
And it is quite confusing that new values are being generated inside the Leader itself - it is quite far from Raft intended usage.
And for #7 correction the problem with CLogEntries
uniformity (which I describe above) compicate the situation: handleHeartbeatTimeout
generates CLogEntries
and could generate something like CCommitted
but it is code from kontiki library and so Leader can not observe this CCommitted
otherwise than to process CSend
and take information about commited index out of it. And that seems to be very unintuitive control flow.
The main problem is that every node (including Leader) needs both log entries and commited index for correct Raft use. Log entries are easy as Leader is receiving it (out of somewhere) but commited index is controlled by Raft implementation so Leader have to get it somehow from Raft itself.
I forgot to watch this repo when I forked, so I missed this discussion.
That was my confusion as well that a client request to log an entry does not seem to be encoded in the API, so the leader's log is updated as if under his feet, implicitly by the driver that uses the model.
Also, from my current understanding, there can only be a single transaction in flight really, as there is no notion of "issue a write" and "write completed", so the driver needs to synchronously perform all commands. After reading the paper, I'm not sure if it's the Raft's limitation or the current implementation. I could see that it should be possible to implement.
So how do we proceed? I have this weekend free for some Haskell hacking, I was thinking about testing in the spirit of http://www.haskellforall.com/2013/11/test-stream-programming-using-haskells.html, but if the APIs going to change, I suppose it's a bit pointless...
Regarding CSetCommitIndex
: this is issued upon every AppendEntriesRequest
received on a follower, or on a leader whenever an AppendEntriesResponse
is received (the leader can recalculate the commitindex in there, since that's where it keeps track of the log indices stored on followers). The discussion about this should go into #7 though :-)
Regarding the adding of log entries in the udp
demo within handleHeartbeatTimeout
: this is obviously on ly for demonstrational purposes, and I agree it's most likely not the right place, adding confusion. I guess this should be handled by some other 'client' thread.
@jkozlowski is right: new entries to be distributed in the log are added to the log on the leader outside this library (and I really want to keep itlike that). Raft is all about distrbuting a (potentially) growing log, but from the protocol perspective this log is static.
Sorry for the multiple comments, working from a tablet, not that easy...
Regarding multiple concurrent transactions: that's certainly possible, and nor the Raft protocol, nor the library design (if #7 is fixed) prohibit this.
Here's the trick: whenever some code appends a transaction to the log, it knows the log index (and can e.g. connect this to some client request). When the driver (in which collaborates with whichever subsystem is manipulating the log) receives a CSetCommitIndex
command, it can know which transactions have been distributed correctly (i.e. all transactions with a lower log index).
It is not within the scope of this library to handle this. All Kontiki should provide is a 'model' implementation of the basic Raft consensus algorithm which is all about replication a (most likely growing over time) log/journal.
Yep, that now makes sense, thanks.
@jkozlowski It'd be great to have some testing in line with what @Gabriel439 did there. I don't think the overall API of this library should change too much, except for a new command being added as part of #7, and even without that, the current implementation can be tested partially, even though a commit-index is not exposed (e.g. the master election part of the protocol is fairly distinct from the log).
@NicolasT, you are right that descussions of CSetCommitIndex
should go to #7
And as for your words on udp
demo and handleHeartbeatTimeout
this code is not in the demo itself and is the part of current kontiki. And I don't see any other way to pass new entries into kontiki so it will take them into account and generate CSend
as a result.
Do you see any such way?
I'm not sure I understand what you want to achieve. You mean you want to trigger a replication manually after extending the log, not waiting for a heartbeat timeout?
Looking into Raft paper I see:
Once a leader has been elected, it begins servicing
client requests. Each client request contains a command
to be executed by the replicated state machines. The
leader appends the command to its log as a new entry, then
issues AppendEntries RPCs in parallel to each of the other
servers to replicate the entry.
So client request is supposed be replicated immediately.
But looking into a Go implementation at https://github.com/goraft/raft/blob/master/peer.go#L124 they also send entries only on heartbeat.
But its better to look into original implementation. And there I see append
function - https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1549 and that function triggers stateChanged which will result in entries being sent (almost) immediately in peerThreadMain
https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1397
BTW if looks like this details are not actually about "MonadLog vs CLogEntries" not sure if we should split it into another issue or issue should be renamed. Looks like the second option is better as github does not allow to move comments between issues.
Aha! I think I know what's going on.
The Raft paper has gone through a couple of revisions (@ongardie might know more). The Kontiki code is based on an early draft version of the paper, and so is @goraft. In that version of the paper, the leader didn't provide write-access to the log, it simply replicated 'some log'. Whilst the newer approach might be easier to explain, I tend to like the old one more (simplifies things a lot!) and think we should stick to that approach (unless there's an important reason not to).
Maybe you have a link to that draft? (Current paper on https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf also has status of draft)
Commit 51773c3af76c503d8a336cb03aa24ee1b9d7e4f3 in logcabin (which includes append
) is from 2012-09-10 i.e. 8 months before kontiki was even started.
Postponing sending to hearbeat could improve throughput but would affect latency negatively.
I don't see much simpification in prohibiting single entry - there should be no problem in appending just a single entry and invoking almost the same code as in handleHeartbeatTimeout
(or should it be exactly the same?).
I think that limiting log entries sending only to hearbeat events is to restrictive for a library dealing with Raft
Agree, adding code to inject/trigger a heartbeat-timeout (or something along those lines) would make sense. But it's strictly speaking not required ;-)
A the moment I'm not sure if kontiki actually need separate function to implement it as it seems that injecting heartbeats could be enough. I was thinking on implement it outside of kontiki (around actual network protocol the way you have it in udp.hs demo) to see how it will look like. But looks like I have just forgot about it :) Hope to do it when I'll have some time.
While thinking about a patch for #7 I've stumbled upon a problem: at the moment I don't really understand current kontiki API. I.e. I don't understand whether client code should use
MonadLog
orCLogEntries
. The first seems to provide a way to log any Entry appearing on a node. But ifCLogEntries
does not seem to be uniforrm - it could be received only on Followers and for Leaders some special handling should be used. Is it a problem with my API understanding or there is some inconsistencies in this API? BTW haddocks could help here too :)