celestiaorg / celestia-core

Celestia node software based on Tendermint.
https://celestia.org/
Apache License 2.0
470 stars 245 forks source link

Add optimistic rollup support for Cosmos SDK #62

Open musalbas opened 3 years ago

musalbas commented 3 years ago

The purpose of this issue is to start a discussion about adding support for optimistic rollups to the Cosmos SDK, and what exactly this means and entails.

There are two key questions: a) what does it mean to add optimistic rollup support for Cosmos and b) what components or modifications to Tendermint and Cosmos would this require?

What does it mean to add optimistic rollup support for the Cosmos SDK?

We want to make it possible for people to create blockchains using the Cosmos SDK, and deploy these chains as an optimistic rollup that uses another chain (such as LazyLedger) as a consensus and data availability layer.

Concretely, this means that instead of Cosmos chains using Tendermint BFT for consensus, they would have a single or multiple aggregator that creates blocks and posts them to the data availability layer - no consensus is required from the Cosmos app side. However, these chains would still require their own peer-to-peer network with their own mempools to propagate transactions to others nodes and aggregators, which may generate fraud proofs.

Sub-question: Do we want to implement optimistic rollup support for Tendermint more generally, or just Cosmos SDK? Would it even make sense to add optimistic rollup support for Tendermint? It seems that it might not, because Tendermint itself doesn't define an execution environment, it uses ABCI to communicate with the execution environment. An optimistic rollup is an execution environment in itself, which is what we need to define. Specifically, an optimistic rollup execution environment needs to have a standardised fraud proof system.

What components or modifications to Tendermint and Cosmos would this require?

It seems that there are three main components to think about to implement Cosmos SDK chains as optimistic rollups. Other components may be missing that are not listed here, but this what comes to mind immediately.

Fraud proof support for the Cosmos SDK

At minimum this would require modifying Cosmos block output in some way to include intermediate state roots, which can be used for fraud proofs. We need to investigate if this would require any modifications to Tendermint (or ABCI client). It seems like this can be done purely on the Cosmos SDK side, by appending intermediate state roots to transactions.

Furthermore, we would need to ensure that there is only one commitment in the block header that commits to state. This may require removing other state-related commitments such as validator set etc (which we wouldn't need for a chain that doesn't have its own consensus anyway). This may require Tendermint/ABCI client modification.

Question: is ABCI compatible with the use case of requiring intermediate state roots to be added to transactions after the user has submitted them: presumably after CheckTx passes, the intermediate state root can be appended to the transaction before DeliverTx?

Replace BFT with aggregator(s) / build a optimistic rollup-based ABCI client

Optimistic rollup chains do not require their own consensus, as they use the data availability layer for ordering. Thus, the ABCI client that an optimistic rollup-based Cosmos chain uses should allow aggregators to create blocks, rather than pass blocks through Tendermint BFT. There seems to be two ways to approach this:

  1. Modify Tendermint to replace BFT with aggregator(s). This may be cumbersome as it would basically be stripping out the core component of Tendermint (consensus) and modifying it to do something it wasn't designed to do, but it would mean we could take advantage of existing code such as peer-to-peer layer (which might not be suitable for optimistic rollup chains anyway).
  2. Create own our ABCI client (we could called it Optimint or Lazymint) that is based on aggregator(s) rather than BFT consensus. This may be cleaner. We would however need to integrate our own peer-to-peer network stack, mempool storage, etc. This ABCI client could be designed to allow for any pluggable data availability layer, as well as any ABCI server.

Add third party data availability checks to the block validity rule

When Tendermint or the ABCI client receives blocks, it needs to check that these blocks have been made available on an external data availability layer such as LazyLedger, using e.g. data availability proofs. This would require integrating a LazyLedger light client into the ABCI client.

zmanian commented 3 years ago

Some other ideas.

ABCI applications need essential a fraud proof mode where they get an intermediate state, a TX and then compute the next state root.

You also need a more that lets you compute replay a block while computing the intermediate root.

There also needs to be some kind of paradigm for hot loading code for fraud proofs via container? Gvisor? FirecrackerVMs?

musalbas commented 3 years ago

Can you elaborate on why code would need to be hot loaded? The code for state transitions should be the same code defined by the app.

zmanian commented 3 years ago

Any nodes that need to witness/ validate the fraudulent state transition.

musalbas commented 3 years ago

They should have the code (i.e. state machine) for the Cosmos app, and then they could validate the state transition fraud proof by loading an instance of the app (state machine) that uses a database backend the contains the state of the app at the point of the contested state transition. This state is provided by the fraud prover, in the form of state tree Merkle proofs. I don't think any fancy VM technology would be needed for this.

zmanian commented 3 years ago

I look forward to be proven wrong!

liamsi commented 3 years ago

A few remarks:

It seems like this can be done purely on the Cosmos SDK side, by appending intermediate state roots to transactions.

I don't think this works. E.g. looking at https://docs.cosmos.network/master/basics/tx-lifecycle.html it does not seem a viable approach as the Tx are not meant to be modified after e.g. CheckTx. (@marbar3778 can you confirm this?).

Also, I feel like if this gets abstracted a bit to sth like, "the interaction between tendermint & the cosmos sdk need a way to add additional (tx related) data to the Txs", this could be very beneficial for the sdk and tendermint independent from ORUs and intermediate state roots. That is why issues like the pre-process one exist (cc @ValarDragon).

Furthermore, we would need to ensure that there is only one commitment in the block header that commits to state. his may require removing other state-related commitments such as validator set etc (which we wouldn't need for a chain that doesn't have its own consensus anyway). This may require Tendermint/ABCI client modification.

Regarding state, the app hash is the only commitment to the (full) state. I think the SDK itself does not rely much on particular fields in the block header as tendermint does (e.g. the validator roots are required for tendermint and the tendermint light client to function properly as tendermint does not understand the notion of the abci app's state).

It sounds like for the ORU-based "abci-client" there are more substantial changes necessary anyways (like stripping out the consensus). Hence, if we change the block header like suggested, it isn't really a change to tendermint we'd like to see upstreamed but this more of change that would live in a fork of tendermint or the build from scratch "abci-client" that serves the same purpose (I like the names Optimint or Lazymint BTW). We'd have to make it easy for devs using the SDK with the modified abci-client to track the validator sets in the single state commitments (if they are necessary at all).

Question: is ABCI compatible with the use case of requiring intermediate state roots to be added to transactions after the user has submitted them: presumably after CheckTx passes, the intermediate state root can be appended to the transaction before DeliverTx?

See above. Using vanilla tendermint, I doubt that is possible. If you replace tendermint with "lazymint" (but rely on ABCI), you'd still need a way to get back these intermediate state roots into the block. But if we use our own abci-client, well then we can also do a slightly cleaner separation and have a dedicated field in the block for intermediate state roots. Regarding ABCI: it does not support that directly either but if we are modifying/implementing the underlying abci-client, we can probably sneak the intermediate state roots into the replies of ResponseCheckTx or maybe ResponseDeliverTx and make lazymint write them into the field. A cleaner way would be above mentioned pre-process approach.

Regarding the two options I don't have a good intuition yet which would be easier. But a few notes:

Modify Tendermint to replace BFT with aggregator(s).

It is possible to replace the consensus reactor: https://github.com/lazyledger/lazyledger-core/blob/7b84a4c74317453c6fd4192b1b82619328b1b97c/node/node.go#L131-L154

There is an ongoing effort to make tendermint usable in different "modes" where only certain reactors are turned on. This work could give a rough idea how to approach this. More info:

Create own our ABCI client (we could called it Optimint or Lazymint) that is based on aggregator(s) rather than BFT consensus.

Yes, that could certainly be much cleaner. On the other hand this might take much longer too (peer-to-peer network stack, mempool, storage etc. are already ready to be used in tendermint).

In order to decide which approach to take, I'd propose to describe the requirements of "lazymint" in more detail. It might turn out that we'd better off to start with a fork of tendermint and disable the reactors we don't need. Or it might turn out out that the differences still outweigh the similarities and we are better off by starting from scratch (we might still able to use parts of tendermint as libraries to get started).

liamsi commented 3 years ago

I'm not sure why we would need to hotload code via a container as @zmanian mentioned 🤔

It seems to me the problem of connecting different chains with each other (dynamically and without much or any coordination) came up already in the context of Dynamic IBC by the agoric team. Here is a post on that project by @dtribble: https://medium.com/agoric/the-road-to-dynamic-ibc-4a43bc964bca Also related in this context is CosmWasm by @ethanfrey. Above mentioned problem & approach partly motivated this proposal for the Cosmos Hub to integrate CosmWasm.

cwgoes commented 3 years ago

As IBC deals with message passing between blockchains with their own ordering mechanisms, I don't think it will be very useful in constructing the right interface between the optimistic rollup state machine and LazyLedger - ABCI sounds closer to the right starting point there - but IBC might be helpful as a standard for communication between multiple rolled-up chains, which presumably will be executing asynchronously from each other (so IBC's asynchronous model fits) - the client/connection/channel abstraction set could be used mostly unmodified, but internally the LazyLedger chain would implement special clients which can check cross-chain packets directly once the relevant blocks have been ordered by LazyLedger, this should be pretty efficient.

ValarDragon commented 3 years ago

@musalbas

They should have the code (i.e. state machine) for the Cosmos app, and then they could validate the state transition fraud proof by loading an instance of the app (state machine) that uses a database backend the contains the state of the app at the point of the contested state transition. This state is provided by the fraud prover, in the form of state tree Merkle proofs. I don't think any fancy VM technology would be needed for this.

Could you explain this a bit more? As I understand it, you're saying that the L1 should have the code for the L2 state machine hardcoded, and not expressed in some VM language. This doesn't seem like it should be true to me, unless the L1 only has to verify a known subset of the L2 state machines code or only supports L2's that use the hardcoded state machine.

I do think hardcoding the lazy-ledger state machine into the L1 does make sense if its using it for data availability proofs. If this is whats happening maybe there is terminology confusion. Perhaps phrasing as the L1 hardcodes a "fixed ORU", and VM-based L1s support "arbitrary ORUs" may mitigate the confusion?

musalbas commented 3 years ago

In order for chain A to verify the state transition fraud proofs for chain B, it must have the state machine code for that chain. By default, the state machine of Cosmos apps is run in an environment that isn't sandboxed or deterministic (i.e. Golang). Therefore for security reasons, chain A cannot dynamically run and execute the state machine of chain B without a chain upgrade approved by the social consensus.

If however, the state machine for the Cosmos app is developed in an environment that is sandboxed and deterministic, such as a modified version of WASM, then dynamic loading of code would be possible. I believe the CosmWasm project can help with this. On 23 September 2020 03:52:45 Dev Ojha notifications@github.com wrote:

@musalbas

They should have the code (i.e. state machine) for the Cosmos app, and then they could validate the state transition fraud proof by loading an instance of the app (state machine) that uses a database backend the contains the state of the app at the point of the contested state transition. This state is provided by the fraud prover, in the form of state tree Merkle proofs. I don't think any fancy VM technology would be needed for this.

Could you explain this a bit more? As I understand it, you're saying that the L1 should have the code for the L2 state machine hardcoded, and not expressed in some VM language. This doesn't seem like it should be true to me, unless the L1 only has to verify a known subset of the L2 state machines code or only supports L2's that use the hardcoded state machine.

I do think hardcoding the lazy-ledger state machine into the L1 does make sense if its using it for data availability proofs. If this is whats happening maybe there is terminology confusion. Perhaps phrasing as the L1 hardcodes a "fixed ORU", and VM-based L1s support "arbitrary ORUs" may mitigate the confusion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ethanfrey commented 3 years ago

I would love to see an experiment of optimistic rollup of general cosmwasm contracts.

We approach 1.0 shortly, and that would be a good place to start experimenting, as no more versioning issues.

One issue is we allow callbacks to native sdk code, in particular staking and bank. That can be mocked out for a secure runtime, but there is an issue of the sdk integration to differentiate from pure contracts. Also the auth and fee processing takes place in native code

zmanian commented 3 years ago

My sense is that the CosmWasm isn't necessarily a good fit here because it isn't designed to be an entire state machine.

CosmWasm is designed to extend the capabilities the CosmosSDK state machine rather be an entire state machine unto itself.

ethanfrey commented 3 years ago

If we restrict the rollup to a subset of cosmwasm contracts that only interact with the bank module, this should be just as a complete vm as the ethereum vm (which handles sending native tokens and checking balances).

Since we allow callbacks into custom blockchain code, full cosmwasm support would be a least as hard as generic sdk support. We could also not include ibc interactions in the rollup. But, I think a limited subset could be considered achievable, and that limited subset would actually be useful for many defi applications.

rootulp commented 1 year ago

Create own our ABCI client (we could called it Optimint or Lazymint) that is based on aggregator(s) rather than BFT consensus.

Linking to https://github.com/celestiaorg/optimint which took this approach

evan-forbes commented 4 months ago

this issue could still be relevant for rollkit, but depending on https://github.com/celestiaorg/celestia-core/issues/84#issuecomment-1956269174 we might not need this issue as well

musalbas commented 4 months ago

We could also do ZK fraud proofs of entire blocks at a time, which wouldn't require intermediate state roots, if feasible.

On Wed, 21 Feb 2024, 10:56 Evan Forbes, @.***> wrote:

this issue could still be relevant for rollkit, but depending on #84 (comment) https://github.com/celestiaorg/celestia-core/issues/84#issuecomment-1956269174 we might not need this issue as well

— Reply to this email directly, view it on GitHub https://github.com/celestiaorg/celestia-core/issues/62#issuecomment-1956281261, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGOEBNY7VQ7Z22R7LULPK3YUXAFXAVCNFSM4RR62252U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGYZDQMJSGYYQ . You are receiving this because you were mentioned.Message ID: @.***>