MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.97k stars 522 forks source link

Investigate performance of Mina node before bootstrap #15763

Open georgeee opened 1 week ago

georgeee commented 1 week ago

Context

In most runs bootstrap and catchup are the longest routines in the process of synchronizing a just-started node.

Bootstrap happens only due to one of the following circumstances:

All in all, it is not expected to happen on every run.

Catchup algorithm is known for its inefficiencies and there's a separate stream of work (around integrating Bitswap) to improve catchup time.

However a distinctly visible amount of time is spent by Mina node before bootstrap/catchup start. It's worth to investigate it separately.

Sneak-peak into numbers

While synchronization time varies significantly due to such factors as connectivity (how many healthy peers Mina node was able to establish connection to), hardware (CPU/RAM configuration) and network state (is synchronization being approached at a moment of high transaction traffic), it's useful to look at near-perfect conditions to see what are numbers there.

Using log explorer it's possible to check how different code parts perform when being executed on o1-managed mainnet nodes. These nodes are known to be healthy, long-running, well-resourced etc.

Some observations based on querying the logs:

TL;DR

Mina node spends ~13min (measured in perfect conditions on mainnet) performing some initialization even before the actual synchronization (bootstrap/catchup) starts.

mrmr1993 commented 1 week ago
  • It takes around 1m30s to perform computations between log messages Initializing with runtime configuration and Daemon will use chain id <..>

We've found that the biggest input to this is likely to be generating the SRS (and the lagrange basis thereof) to initialise the proof system. We can cache this (and were planning to anyway to speed up unit tests).

  • Initialization of prover and verifier processes takes around 1m each

This is the same issue: at this point we use the cached SRS, but still recompute the lagrange basis using a big FFT.

  • It seems like loading transition frontier takes up to 9m

Yikes, that seems slow. Is this coming from applying all transactions from the frontier's blocks?