filecoin-project / oni

👹 (DEPRECATED; see README) Project Oni | Network Validation
https://docs.google.com/document/d/16jYL--EWYpJhxT9bakYq7ZBGLQ9SB940Wd1lTDOAbNE
7 stars 5 forks source link

[test request] tipset assembly in real-world network conditions (monte carlo simulation) #14

Open raulk opened 4 years ago

raulk commented 4 years ago

What would you like us to test?

Tipset assembly in real-world network conditions such as delays, packet loss, packet duplication, jitter, latencies between miners.

Technical implementation details.

Possibly long-running tests using some Monte Carlo simulation model. Need to define what metrics to gather from the system as the tests run. I envision this as a long-running loop that deploys, for example, 50 lotus instances.

For each iteration:

What should we measure?

TBD.

Which components are involved?

TBD.

On a scale from 0-10, what's the proposed _discomfort factor_? In other words, how uncomfortable would you be if we went live without having tested this? Explain why.

TBD.

Additional remarks.

TBD.

Requestor: @magik6k.

raulk commented 4 years ago

@yusefnapora and I just discussed what a good charting/graph approach would be. The batch runner allow us to run lots of simulations, one after another, with random parameters. If we run these overnight, we'll wake up to hundreds of network simulations we need to quickly make sense of by paging quickly through results.

We came up with this:

yusefnapora commented 4 years ago

@raulk here's what I've got so far:

chain height with reverts

The "effective height" takes revert operations into account, so the little downward blips are reverts followed by applies. I was interested to learn that there are several revert / apply operations for normal fast-forwards, but it makes sense once you see them. If we have a tipset with a single block and get another valid block to include, we revert the single-block tipset and apply the new one. So there's a "revert blip" for every tipset with multiple blocks.

Next step is to combine this graph across all the test participants; so far I've just been working with a single miner for simplicity, but it's collecting data from everyone.

After that, we can see how it looks with weird network conditions :)

raulk commented 4 years ago

Nice, this is a great start! 😍 In fact, the 46s mark is showing a slightly different pattern than the rest (the valley is a bit wider). The downward blips are number of revert operations? I wonder if we can find a way to draw both: number of revert operations, AND the heights that were reverted, AND the unique number of block CIDs seen at that height?

Rationale: if we are reverting a tipset with block B1 at height N to replace it with a tipset with B1,B2 at height N, to then replace it with a tipset with B1,B2,B3 at height N => this would be ordinary behaviour. And what we want to capture there is the time it took before we advanced (the width of the valley, as you are showing here).

And a fork should look completely different, I guess.

raulk commented 4 years ago

We can definitely get rolling with this, though. Let's start running the batch jobs and collecting the raw data! In parallel, we can fine-tune the visualisations while those jobs are running.

raulk commented 4 years ago

Here's a pretty poor sketch with some further ideas.

image

^^ take all of this as creative input, not as instructions ;-)

yusefnapora commented 4 years ago

@raulk this is great stuff, thanks :)

Good eye on the different pattern at 46s - that was a tipset with three blocks, so there were two reversion. You can see if you zoom in a bit:

chain height, multiple reverts

I think something like your stair-step graph is possible with the chart libs I'm using, but I'll need to dig in a bit more.