Determine load testing framework for the sorts of transactions (deposits, withdrawals, txs) we expect

karlfloersch commented 3 years ago

Is your feature request related to a problem? Please describe. In order to detect some common failures that we have in Geth, we often need to spam many transactions, and transactions of different types. This means that we will need to design a simple framework for submitting specific types of transactions, different frequencies of transactions, etc. Once we have this framework it can be hooked up to our CI and deployments to ensure that we do not suffer regressions.

Describe the solution you'd like The load tester should spin up a sequencer, a verifier, and a replica.

This framework should make it easy to...

Spam the following types of transactions:

deposits
sequencer transactions
withdrawals

Define transaction sequences that are to be spammed, for example:

deploy an ERC20 contract
mint some tokens
transfer those tokens to various parties

Define transaction submission frequency distributions, or in other words:

TPS
Bursty-ness

Configure the load test duration, such as:

1 minute
5 hours

Collect important metrics, such as:

transaction latency
dropped transactions
CPU & Memory usage

Detect failures in our various services, eg:

Message relayer
Batch submitter

Check integrity of the sequencer:

monotonicity violations
skipped deposits
double played deposits

Check integrity of the verifier & replica:

verify the last state root matches the sequencer
all of the sequencer integrity checks defined above (eg. monotonicity violations)

This framework should be designed with the eventual goal of tracking all of these metrics and being maximally configurable. However, it should also be implemented incrementally & should suggest steps which can be taken in the near term to cover the most important features.

karlfloersch commented 3 years ago

Because we do not currently have infrastructure set up for easily testing verifiers or replicas, we may want to de-prioritize: Check integrity of the verifier & replica:

karlfloersch commented 3 years ago

It seems that while all of these tests & metrics could be added into a big load test, we can get a lot of benefit from breaking this up into smaller tasks. Specifically:

Smoke tests

Add typescript testing framework inside of integration-tests which:
- Submits a bunch of deposits
  - Record dropped transactions, failed transactions & latency
- Submits a bunch of transactions to the sequencer
  - Record dropped transactions, failed transactions & latency

Notes:

Deposits & transactions should be sent in parallel but they don't need to be maximally fast.
Basic configuration for things like TPS, can default to uniform distribution.

Geth integrity checker

Runs a post check that iterates of Geth's DB which checks for:
- Double played deposits
- Skipped deposits
- Monotonicity violations
Not a test in itself but can be added as a test case after running other tests (eg. the above smoke tests)

Notes:

The complexity here will come from caching required if the DB gets too big. Can start out with everything in memory for this reason. There should be a clear path to minimize the memory so it's fine to make this an MVP.

Monotonicity / skipped deposit targeted tests

Geth does not increase the timestamp when it loses connection to L1
Geth does not double ingest deposits when it stops and restarts
Geth starts up with a non-zero timestamp
Geth does not have a monotonicity violation when two transactions are sent at the same time
Geth does not have a monotonicity violation if it gets a transaction right when it boots up

Notes:

Can use a docker-compose controller library to restart Geth (we already import this into the monorepo)

Simple Performance benchmarks

Submit many transactions to the sequencer in a short period of time
- Sanity check the response time.
How Geth's DB grows over time

optimisticben commented 3 years ago

I like the split a lot, while it's all testing, they serve different purposes. Instead of kitchen sink I would classify those as smoke tests, things that are expected to work in normal circumstances. We should be able to run smoke tests against production systems, excluding any side effects.

The performance tests should definitely target new infrastructure and running the smoke tests at the same time verifies those tests under load.

For the monotonicity checks we could carefully construct the scenarios detailed using tasks defined in something like Tekton and/or run a "chaos monkey" in the deployment that attempts random, forced failures and watch how the system reacts.

karlfloersch commented 3 years ago

Awesome @optimisticben I updated my comment to call them "Smoke Tests" & definitely makes sense re. tekton / chaos monkey.

I'm going to close this issue and say that https://github.com/ethereum-optimism/optimism/issues/963#issuecomment-849253240 qualifies as "determining our load testing strategy". Otherwise I feel like this issue is going to stay open too long because it's so open ended 😁

cyborgdennett commented 1 year ago

Any updates on this?

ethereum-optimism / optimism