Runnable Benchmarks For Chainsync

Description

As we work on improving ChainSync performance, we need a benchmark we can run to track our improvements.

What we have so far is:

A bunch of individual experiments and investigations into slowness of chainsyncing
A few benchmarks of extremely small parts of the chainsync-ing process
Some metrics that in the real world, chainsync-ing is very slow (12 hours to sync a 3 month chain on user devnet)

We need benchmarks that sits somewhere between "testing a very small part of the process" and "chainsync is slow after devnet has been running three months". Such benchmarks could cover an entire chainsync process including network fetch time, or they could test some small but sizable part of the process.

Acceptance criteria

An actual Golang benchmark we can run, repeated, maybe eventually as part of CI (but far off on that one) that will give us an approximate idea of how long it will take to sync a sizable, real world chain. Ideally we could test against multiple types of chains with various properties

Risks + pitfalls

The biggest obstacle to benchmarks is we have no way to quickly produce a real block chain that simulates what real transactions operating on a real FileCoin network would look like. (or even produce such any such chain in repeatable manner)

There's the additional risk of spending so much time on this problem that we don't get to actually improving things :)

Where to begin

It seems like there are two potential routes for tackling the big problem, which is making chains:

1. The new FAST network provides an opportunity to simulate a filecoin network but with properties that could produce a chain faster (small block time, programmable deals, ability to stop mining, etc). We could write a script that spins up a few nodes with a small block time, has the nodes continually make deals with each other, and runs for a hours before turning off new mining, and closing without destroying the environment. We could then use this environment to spin up a new node and have it attempt to sync a chain. Some of the scripts we are using to test staging devnet are already close to this, so it seems like low-ish hanging fruit.
1. We could work on the existing chain builder class and try to expand it significantly to produce real world block chains. This seems like a big endeavor -- because at some point building an entire chain is effectively simulating the entire Filecoin network. But it would give us much greater control over testing, and potentially make it easier to test parts in isolation.

Thanks for the issue and insight.

I think we can probably reach some valuable low-hanging fruit quickly with approach (ii). That will be much faster to execute in addition to offering more control. As you point out, exerting that control will require effort as we try to make a chain that's more representative of a real network. As an opinion, I suggest wrapping the existing chain.Builder with "application-level" functionality, rather than coupling it directly with specific actor methods etc.

Our work so far points to state tree storage and serialisation, and we can give that a good workout without invoking the complexities of real proof calculation or validation required for a more representative network (and when data do point to those latter as bottlenecks, they can be benchmarked independently.

Something like approach (i) will still be valuable, but I think a bit later.

filecoin-project / venus