Make multi block processing parallizeable

mattsse commented 3 years ago

Motivation

This attempts to fix #24 by adding support for spawning the processing of blocks

Solution

Introduced new Stream types:

BatchEvaluator: a Stream: Send+Sync that processes multiple blocks and their inspections and yields the Evaluation
- The dyn Inspectors and dyn Reducers are therefor also required to be Send + Sync
BatchInserts: takes a Stream of Evaluations and puts them in the DB

The BatchEvaluator can be spawned to a new task, a new task option in the BlockOpts allows to control how many BatchEvaluators should be spawned, the range of blocks is then divided equally to the BatchEvaluators which all pipe their Evaluations via channels to the BatchInserts. Right now a single MevDB handle is used, but it would be possible to add more.

Since I don't have access to an archive node, I wasn't able to test that yet. Any Tips on how I can test this would be appreciated 🙃

two more batch subcommand options are added:

tasks: how many task should be used for fetching all the info
max-requests: how many requests each task is allowed to execute concurrently

gakonst commented 3 years ago

Just tested this, for 100 blocks it took me 196 seconds on master (from a remote node), whereas with 10 tasks it took 23s and with 25 tasks 20s. This PR seems to do >90% of the work towards parallelizing the network layer / block processing. Sick work, thank you.

A next step here would be for us to further improve the database layer by batch inserting. I think the way to do this is by storing N Evaluations in MevDB in-memory, and then COPY each batch of N evaluations to the db, instead of INSERTing each time as it can be inefficient

obadiaa commented 3 years ago

Amazing! Thank you @mattsse!

flashbots / mev-inspect-rs

Make multi block processing parallizeable #51

Motivation

Solution