Create cluster simulator

jkozlowski commented 10 years ago

The simulator can simulate random things happening to the cluster, e.g. nodes dying/coming back up/timeouts/nodes processing events and processing resultant commands.
Added first property that runs the cluster until there is a stable leader elected by the quorum.
May need to be extended further to add actual events arriving that are then replicated but I'd rather have that in a separate commit when I try to fix NicolasT/kontiki#10: would be intersting to see if I can write a test case to prove that the bug is there. Fixed jkozlowski/kontiki#2

jkozlowski commented 10 years ago

Hey,

There is still the issue of modelling the events slightly better (sometimes the test might run longer than a minute in which case it times out), but I think it's a good first cut. I was hoping you guys could take a preliminary look to see if you like the approach or not + it's really my first piece of non-trivial haskell so any feedback is welcome.

For the failures I think of two things: 1) Pragmatic: define the whole test passed if, say, 90 of the cases (out of the usual 100) passed and didn't fail with a timeout. 2) Try to model the cluster a bit better: few improvements

For each node, keep a queue of messages per the remaining peers and draw randomly from them, to simulate messages arriving randomly.
Model timeouts by counting simulation steps: instead of just randomly firing, actually model it by sending those once a leader node goes down (I'm a bit fuzzy here, but I think this can be improved)
Model node failure probability by somehow tying it to the uptime (so nodes that have been up longer, have higher probability of failure).

I think this approach is not 100% deterministic, not sure if it can be made. In either case it is possible to get a sporadic failure, but we're making the probability smaller.

Let me know what you think

qrilka commented 10 years ago

I'm eager to take a closer look, hope that I'll have enough time on this weekend. As a side question: do you plan any real use of kontiki or just "playing around"?

jkozlowski commented 10 years ago

Thanks, I'd really appreciate a review :)

I treat it as a great way to learn haskell and I'd love to see it finished with persistence etc. I code Java at work, so chances of me using anything like this are slim atm. I obviously have an ulterior motive as well, I'd like to become more active in open source.

What about you?

qrilka commented 10 years ago

I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed etcd and Raft and then came to this library. At the moment I don't have any active Haskell projects at work and do this (though not very actively) just as hobby.

jkozlowski commented 10 years ago

Nice. I read the raft paper at about the same time as I seriously started reading Haskell, so it's nice there was this project to hack on.

I saw etcd, it's getting quite a bit of press now due to docker and then projects like Flynn and coreos, so it's quite a hot topic. We're a bit late to the party though, got beaten by go-lang, damn :) it's a pity as it could bring good press to Haskell, more than anything FP complete could do, not that they aren't awesome.

Not sure what the value proposition would have to be to replace etcd with something built on top of kontiki, but who knows... :)

Sent from my iPhone

On 20 Dec 2013, at 06:09, Kirill Zaborsky notifications@github.com wrote:

I worked on a couple of projects earlier this year and one of them required distributed storage mechamism. Actually it was discontinued in the begining of the summer. And (even) after that I've noticed etcd and Raft and then came to this library. At the moment I don't have any active Haskell projects at work and do this (though not very actively) just as hobby.

— Reply to this email directly or view it on GitHub.

jkozlowski commented 10 years ago

I am now thinking my testing should go along the lines of simulating the cluster for N number of events and testing that the properties of the Raft protocol are true at all times, as per paper Section 5:

Election Safety: at most one leader can be elected in a given term. §5.2
Leader Append-Only: a leader never overwrites or deletes entries in its log; it only appends new entries. §5.3
Log Matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index. §5.3
Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms. §5.4
State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index. §5.4.3

These look like the could be codeable: also, with this approach it doesn't matter if the events happening are not realistic, as long as the cluster adheres to these. The bug in NicolasT/kontiki/#10, however, would not potentially be discoverable, as other pieces of code guard against this and if a leader was elected, it would be forced to step down. But these type of tests should be a good compliment to unit-tests. I shall give that a go tomorrow.

NicolasT commented 10 years ago

I think what @jkozlowski wrote in a comment above is the way to go: given a set of clusters, let all clusters behave 'correctly' (no Byzantine behavior), i.e. let them only use the state-machine as-is, not making up random events/messages, then let them act by communicating on lossy/reordering/high-latency/... channels, and at all times ensure the Raft invariants are kept. I.e. some sort of model-checking (think Spin/Promela and others).

I'll go through the code now.

NicolasT commented 10 years ago

I really like this. One request though: could you split-out the 'plumbing' from the actual test-code (in a different module)? I feel like this should make adding more tests/checks easier.

jkozlowski commented 10 years ago

Thanks for the review, I'll go through the individual comments inline. As mentioned, this pull request was to illicit initial discussion if what I'm thinking makes sense, I've started working on the second iteration already.

The idea is to, on each tick of the simulation, simulate a single event for all nodes (but don't append resulting events to queues until all are simulated, so that there's arbitrary ordering allowed), where event can be timer going off, simulating event from queue, killing node. Once all are simulated, shuffle the events and distribute them among queues (this is how we get latency and reordering).

That means I'll want to actually simulate timeouts properly, so I'm thinking of somehow modelling the average broadcastTime (as per the paper, so I can get: broadcastTime < electionTimeout < MTBF). I'm looking at various queing models in papers, it's a lot of fun. I probably won't have this for this iteration, but it's on my mind.

Should be fun.

qrilka commented 10 years ago

@jkozlowski, the moment I don't understand in your current implementation - is there anything like time in this model? From the code I see I'm not sure that heartbeats will be close to being periodic. Also why there is no simulation event appending new entries? It looks like frequency function you use will result in proportional probabilities of node being killed and node responses being sent. I'm not proficient in simulations but I'm not sure that such correlation between different model parameters will give proper results.

jkozlowski commented 10 years ago

Yep, this first iteration was not very realistic in that all events were random. I want to simulate timeouts properly in second iteration (and more proper nodes going down, by drawing the restart times from some distribution and restarting nodes according to those).

As a side note, even thought this current model is not realistic, the properties mentioned in the paper should still hold (and yes, I missed the appending of events, as the first test I wanted to have was leader election, which turns out to be not really what we want).

Thanks to both for all your feedback, publishing this meant I had to review it myself and I think we now should have a better model, I just need to implement it :)

qrilka commented 10 years ago

BTW if you'll find good papers how to do such simulations properly - please share :)

jkozlowski commented 10 years ago

Sure :) I think the point here is to decouple it from time and think about it in terms of ticks as defined above (which means you can run it regardless of wall time). Once I have that I'll just have to derive a formula to calculate average size of the incoming queue, which will give me the broadcastTime in ticks which means I can bound the timeouts with it and we're home.

At least that's my theory :)

NicolasT commented 10 years ago

I've been thinking about this a little more, and I think there might be an alternative testing strategy. What if, instead of testing the FSM as-is by feeding quasi-random input & determining the output etc, a different approach is taken: make a full map of the FSM states, and feed this into a model checker.

Obviously it's impossible to generate/dump all states, but I think it should be feasible to generate, e.g. in a 3-'node' system, all possible states on every node (modulo actual log content and maybe some other variables which don't really matter, obviously) and turn this into a Promela script, after which all invariants can be added, and checked using Spin.

This might be a bit overkill, but anyway... Heck, it could be a really cool research project. Anyone looking for a master thesis subject? "Validating a Haskell Raft FSM using Spin" :wink:

jkozlowski commented 10 years ago

Yep, I thought about that too, but I don't have much experience with this approach. I think that's how they've validated the algorithm itself, from what I remember.

A bit busy at work this week, I'm hoping to come back to this over the weekend.

NicolasT commented 10 years ago

Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation.

jkozlowski commented 10 years ago

Yep. I started reading about this today, after seeing a master thesis about testing CERN's state machines (they used a different language, but it's the same thing). I'll try to finish up what I have and then I'll see if we can attack it this way, I'm intrigued:)

Sent from my iPhone

On 10 Jan 2014, at 20:48, Nicolas Trangez notifications@github.com wrote:

Indeed, there's a TLA+ proof (http://raftuserstudy.s3-website-us-west-1.amazonaws.com/proof.pdf). Note that's the other way around though: this proof provides assurance about the protocol. What we're looking for is assurance about the correctness of the implementation.

— Reply to this email directly or view it on GitHub.

NicolasT commented 10 years ago

Cool! :thumbsup:

jkozlowski commented 10 years ago

Just found this, thought might be of interest: http://javapathfinder.sourceforge.net/. Effectively Java -> Promela, at least it appears so from the first paper: http://havelund.com/Publications/jpf1-report-99.pdf

abailly commented 8 years ago

How about using/replicating what jepsen does for testing?

NicolasT / kontiki

Create cluster simulator #11