consensus-shipyard / ipc

Apache License 2.0
33 stars 30 forks source link

Deterministic genesis (+ import/export commands) #1002

Open cryptoAtwill opened 1 month ago

cryptoAtwill commented 1 month ago

Introduction

When a follower node want to join a network, one could run into app hash not equal at the genesis block when starting fendermint.

This is because:

For 1, it is most likely human error, the correct genesis configuration file should fix it.

The more problematic one is the second scenario. It happens when, say, a 3 node cluster is setup, but then there are some ipc-contracts update and when a 4th node tries to join the network, the ipc-contracts byte code is not the same as when the original 3 nodes were setup, leading to a wrong app_hash.

The root cause of scenario 2 is due to some configuration parameters are not in the genesis file. These parameters are:

To fix this, one must make sure the data loaded into fendermint at genesis is the same. There are a few approaches:

1.Export/Import genesis

When the init_chain is called from cometbft, once the genesis state is created, we force a genesis snapshot to be taken and write this snapshot to disk. This snapshot can be uploaded to some public space so others could access it. Then when fendermint is launched, one can just pass the snapshot either throw path or url to indicate it's loaded from genesis snapshot. Ofc, one needs to ensure the correct genesis snapshot is used.

2.Include everything to genesis

Instead of passing in builtin_actors_bundle and all those paths in fendermint configuration file, we can just put the actual bytes in the genesis file, this would result in something like:

{
    "builtin-actor-bundle": "0x....",
    "custom-actor_bundle": {
        "eam": "0x...",
        ...
    },
    "ipc-contracts": {
        "gateway": "0x...."
         ... 
    }
}

We could include content hashes to the above genesis file and update genesis generation command to include these parameters. Ofc, the downside is bloating the genesis file size, but perhaps this is a small price for consistency.

3. Include hashes and links

Instead of putting raw bytes into the genesis, maybe we can just add the url and sha256 hashes to the path contents. In a way, the genesis file does not care where one fetches the content, as long as the hashes match or when not matching, one can quitely tell which parameter is causing the inconsistency, it's good enough. But ofc, the deployer needs to know where to get the historical content because most likely, what the deployer has is really the latest code.

raulk commented 1 month ago

Context

The genesis process is rather complex, counterintuitive, and spread out across a number of components. Just a few high-level notes on general shape:

At some point, we'll need to tidy up this complex dance. But for now, let's stay focused. Our goal is to make sure that new nodes syncing from genesis have a deterministic genesis block to use.

What's currently holding us back is the runtime dependency on IPC EVM contracts, built-in actors, and custom actors. These are built as part of the IPC build and "annexed" to the genesis during InitChain by loading paths.

Consequently our approach is unacceptably sensitive to IPC contract source changes in the repo, Solidity compiler version and parameters, Rust toolchain version, etc.

Discussion

I think we should go with option (2), but introduce a version number so we can later transition to (3) (i.e. offloading binaries and linking to them by CID).

Take a look at Fluence's Kras genesis: https://cometbft.kras.fluence.dev/genesis

The desired end state is for the app_state to include:

We should provide a tool to regenerate the genesis retroactively so Fluence can update their published genesis.

As an added benefit, nothing at runtime will no longer depend on the Hardhat tool: https://github.com/consensus-shipyard/ipc/blob/4a2acb64112d198afeb17b29022996fb6e13d894/fendermint/eth/hardhat/src/lib.rs#L39

^^ we should use that tool only in the genesis generation commands.

cryptoAtwill commented 1 month ago

@raulk yeah, sounds good, we can start with 2 first, then build 3 on top of it. Will add the version as well.