celestiaorg / celestia-app

Celestia consensus node
https://celestiaorg.github.io/celestia-app/
Apache License 2.0
342 stars 281 forks source link

Non-determinism unit test #2414

Closed evan-forbes closed 3 months ago

evan-forbes commented 1 year ago

Description

Write a non-determinism unit test for the state machine. The test will involve creating a set of transactions, executing them, and comparing the resulting apphash with an expected value.

Acceptance Criteria

evan-forbes commented 8 months ago

the goal of this test is to ensure that whatever branch is capable of syncing from scratch. In theory, we can do this with just unit tests, however there are so many different hyper werid edge cases that we don't know what all of those are yet.

for example, we could execute one transaction per block, for every possible transaction type with the v1 binary, save those roots somewhere, and then compare the roots against whatever the result is of the other branch.

we could maybe also run some deterministic (same inputs to each) fuzzing script on each version, and save those results. whatever branch would have to have identical results to process proposal.

rootulp commented 6 months ago

Just noting the OP proposes two distinct tests so we could split this into two issues:

  1. Create a test that compares the results of processing and executing txs on the main branch for each app version.
  2. Create a significantly longer test that syncs mainnet from scratch using a block data archive node.

IMO the first one is more valuable than the second because I expect we'll be able to run the first one in CI and make it a required check prior to PR merge. I expect the second test will take much longer to run and may only be able to run via the nightly (or weekly 😨) CI jobs. That means we'll get delayed signal via the second test. If we're committing to single binary syncs then IMO we still need the second test. But ideally the first test gives us confidence to merge PRs and the second test gives us confidence that new release candidates still preserve the single binary feature.

for example, we could execute one transaction per block, for every possible transaction type with the v1 binary, save those roots somewhere, and then compare the roots against whatever the result is of the other branch.

I really like this idea.

rootulp commented 6 months ago

We discussed this in a sync and came to the conclusion that this test:

  1. Create a test that compares the results of processing and executing txs on the main branch for each app version.

is a nice to have.

  1. Create a significantly longer test that syncs mainnet from scratch using a block data archive node.

is release blocking but it can be performed manually via the mainnet.sh script.

rootulp commented 6 months ago

Evan and I paired on a prototype for the first test: https://github.com/celestiaorg/celestia-app/tree/rp/pair-with-evan

evan-forbes commented 4 months ago

If we take the route of expanding the existing knuu infrastructure, we could run this test directly after a PR is merged instead of nightly.

rootulp commented 4 months ago

Spun out https://github.com/celestiaorg/celestia-app/issues/3490 so that this issue can focus on the first test in the OP:

Create a test that compares the results of processing and executing txs on the main branch for each app version.

rootulp commented 4 months ago

Notes:

Ideas:

  1. Consider writing a manual version of this test using local_devnet. Note: if we do this, we should add a guide on how to create releases and add running the manual test as a step in the release guide.
  2. Consider exploring how other chains (Cosmos Hub, Osmosis) test for version compatibility. However it's likely they don't have coverage for our use case (rolling binary upgrades).
  3. Consider continuing with https://github.com/celestiaorg/celestia-app/tree/rp/pair-with-evan
cmwaters commented 4 months ago

So my take on this is that we already have a script for syncing a node from genesis. Which we should use before every release. If we want it to be faster then we can have some beefy machine we run and just spin up a new node in the same machine (to skip the p2p overhead). Or if we wanted this to be a unit test we could persists only the block and state stores and run the application over the set of transactions (kind of like a local block sync) and compare the app hashes.

The second thing I would advocate for is to add the main docker image to the TestMinorVersionCompatibility. That way we have main (using v1) and all our other v1 tags working together. If we feel txsim is not doing a sufficient job we can expand on the amount of different messages it sends.

When we upgrade to v2, we will need to modify TestMinorVersionCompatibility to move from v1 state machine to v2 state machine for future v2 minor releases. We could create more elaborate tests but I want to review our single binary syncs approach before continuing.