Open colin-axner opened 10 months ago
A few corrections / additional info:
At a very high level, an optimistic chain will use the batcher as its mempool. Transactions for an L2 can be submitted to the batcher or directly to the L1. The batcher will batch L2 transactions and submit these to the L1 as an ordered list of transactions for L2 block execution.
The execution engine (aka op-geth in OP Labs' version, a fork of geth) holds the mempool. The batcher is a very simple component that queries the op-node (to retrieve the span of blocks not yet submitted, between the "unsafe head" and the "safe head") and the engine (to get the actual blocks) batches them and submits them to the DA (at present: Ethereum calldata unless you fork the stack).
The mempool can be public or private (this is just normal geth stuff), it's private on Optimism and I think most big OP stack chains (Base etc).
In a separate process, the op-proposer will query an L2 rpc (the sequencer rpc) for blocks which are available to be submitted back to the L1.
You're describing the batcher, the proposer queries the "finalized head" from the op-node, and posts its blockhash (and a few other roots derived from it).
The op-batcher, op-proposer, and op-node are all separate services.
And so is the execution engine (op-geth).
- ibc-go receives a MsgRecvPacket relay message.
- ibc-go emits an event to indicate a cross domain message should be sent to the related smart contract for the OnRecvPacket callback.
- an op-geth transaction initiates a cross domain message relay to a smart contract for the OnRecvPacket callback.
- the smart contract executes the callback producing an emitted event
- ibc-go is fed the OnRecvPacket result via the emitted event by the smart contract
I'm very unfamiliar with IBC, but is it possible to decouple the networking layer from the light client execution?
If so, what I would imagine as a design is have an independent service responsible for network communication. Upon receiving a MsgRecvPacket, it wraps it into a transaction, which it sends to the op-node (alternatively, the op-node can poll the IBC network service, just like it polls L1).
This transaction can either be a new transaction type (to be implemented within op-geth — which already has custom transaction types) or some kind of regular transaction that hits a new pre-compile contract. In both cases, the goal is that during processing by op-geth, the implementation goes to ibc-go and performs the required verifications there, then the requested EVM call.
The IBC network service would then listen to blocks created by op-geth, in particular emitted events, and transmit sent messages to the appropriate chains.
This would require 3 rollup blocks to be produced to perform the receive packet step (normally executes entirely within a single transaction within ibc-go)
This might be the part I understood the least in that section, and maybe my proposal above is hogwash because of it. It seems to entail that receiving a network packets needs to be block execution, and furthermore on its own separate block.
Maybe it's due to confusion around events? Like step (2) above produces (in my understanding) a "go-style" event, whereas step (4) produces an EVM event.
Thank you so much @norswap for your participation and engagement! I apologize for the delayed reply.
I have updated the issue with your corrections, thank you! These were very useful context fixes for me.
I will reply to your questions and suggestions a bit indirectly in the post below. It would be great to have your input on the approach we took and if there's any design decisions you would have done otherwise.
With tremendous efforts from @srdtrk and @DimitrisJim we spent a few weeks creating a proof of concept for the second approach in this issue. This work builds off the existing work Polymer has done with monomer and goes a bit further exploring the possibility of having two execution engines on the same rollup.
We managed to execute EVM and SDK transactions independently of each other and have the results be fed into a single block returned back to the op-node. Due to time constraints, we were unable to find solutions which allowed us to relay the results of execution in one execution engine to the other state machine.
Below is a architecture diagram which resulted from this proof of concept:
The proof of concept is held up by three repositories:
Note: In reality, there is no "sequencing" and "derivation" mode. Both "sequencing" and "derivation" occur in the same process. I separated the two as I find it conceptually much easier to reason about since they have different code traces. Using my terms, in sequencing mode the execution engine is responsible for gathering transactions which are executed and included in a block based on a timer which is managed by op-node and communicated via the engine API. In derivation mode, op-node will provide a list of ordered transactions. My understanding is that sequencing and derivation mode are run in the same process in order to handle the unhappy path where the derivation logic needs to interrupt the sequencing logic due to a reorg or mismatch in block hashes.
As noted, this section remained unchanged. The fork we used was only to run e2e's and we did not spend time investigating if it'd be possible to do without a fork. Given that the op-stack is in production, it is marked entirely as green.
This component remained unchanged, however we did run into difficulties inspecting the results of a block in active building which was necessary for trying to execute a series of transaction components in a single block (if a single transaction requires two components to be executed in different execution engines). This is noted below in the concerns section.
The goal of the interceptor is to intercept rpc calls from op-node directed for the execution engine and to act as a middleware which enables a sidecar process to be run. This sidecar process can then have its results fed into the block which is returned to op-node.
A learning from this process is that the base execution engine (op-geth) should be given a higher status than the sidecar process (sdk state machine) as it is required to handle additonal rollup specific logic which we do not require from the other execution engine.
In general, the interceptor will parse user transactions, forward them to the respective execution engine based on the runtime mode (sequencing or derivation), receive back the responses from execution, generate any additional transactions to be executed, compose the block results into a single block and relay signal instructions from op-node as necessary (start/stop block production).
Conceptually what we worked on was not executing IBC transactions on a rollup, but allowing for multiple execution engines on a single rollup.
This section is marked as orange or red depending on the components which were implemented in the proof of concept.
Quick links to code:
As noted, there were minimal changes required for monomer. Primarily allowing us to use a custom simapp (which is an issue which will be addressed in that repo). Any other concerns are noted in the concerns section below.
We deployed two contracts (IBCStandardBridge and IBCCrossDomainMessenger). The basic idea was to have an escrow contract. A contract which fulfills the bank keeper interface used by IBC transfer.
This section was not thoroughly explored and was used mostly as a placeholder.
ParentHash
BlockHash
and PayloadID
to get the tests passing. This results in 3 hashes or identifiers for each (one for op-node, one for op-geth, one for monomer). Another approach might be to extend the existing block type with a new field dedicated for the information required by this additional execution engine to function properly. While difficult, the proof of concept process was extremely fruitful and rewarding in terms of the information and insight it provided. We were unable to fully explore all aspects necessary in achieving our goal of IBC integration in the op-stack, but we were left with a concrete architecture diagram which acts as a starting point for this discussion. I believe the currently proposed architecture to be quite complex and will certainly change and adjust with additional information, inputs and perspectives.
Huge shout-out to @colin-axner, @srdtrk and @DimitrisJim for the excellent work researching, designing and implementing the PoC.
After thoughtful deliberation, we have decided to put on hold the work on this approach for integration with OP stack. Due to the high level of complexity of the solution that has been investigated, the unknowns that remain, and the difficulty of generalising this solution to other frameworks/ecosystems, we have decided to explore a different approach and see where that takes us.
Huge thank you to @hamdiallam for walking me through the OP Stack, @AdityaSripal for providing feedback and documenting his thoughts on integration and to @PolymerDAO for discussing their POC with us and providing a path forward to integration!
Context
I will do my best to outline my understanding of how the OP Stack functions. My knowledge is still limited, so please correct me if I have any wrong assumptions.
Flow
At a very high level, an optimistic chain will use its op-node process to generate a blockchain. The batcher make an rpc request for transactions created by this unsafe blockchain and will send these batched L2 transactions to the L1 as an ordered list of transactions for L2 block execution. The execution engine (typically op-geth) has its own mempool which can accept user transactions for inclusion in a block.
The op-node will listen for transaction submissions by the batcher in the BatchInbox on the L1. It will derive these compressed transactions into a list of transactions which are then executed by the execution layer (commonly op-geth).
The execution layer will execute these transactions and when requested by the op-node, it will commit a block.
In a separate process, the op-proposer will query an L2 rpc (the sequencer rpc) for finalized blocks which are available to be submitted back to the L1 as a block hash.
The op-batcher, op-proposer, op-node and execution engine are all separate services.
Terms:
unsafe head: this refers to the head of the rollup chain which contains blocks which cannot be derived from the L1. safe head: this refers to the head of the rollup chain which can be derived from the L1 but whose blocks on the L1 have not been finalized. finalized head: this refers to the head of the rollup chain which only contains blocks that can be derived from finalized blocks of the L1.
op-batcher
The op-batcher batches and compresses transactions which it submits to the L1 via a regular eth send to a prespecified address (0 eth is sent with the compressed transaction as input data). It will pull transactions from the op-node's unsafe head.
op-proposer
The op-proposer will periodically query a RollupClient for blocks at a given height. The block returned, OutputResponse, is a type defined by optimism which allows for easy construction of the L2 Output Commitment, that is the output hash submitted to the L1.
The op-proposer can be configured to submit safe or finalized blocks.
op-node
The op-node contains code which will parse batches posted by the batcher on the l1 and it will invoke the execution engine to produce safe blocks.
derivation
Transactions may be submitted to an L2 via the batcher or directly to the L1 in the case of censorship. The derivation code will derive all transactions to be executed from the L1 block by filtering for deposit transactions and batch submissions by the batcher address.
Here is where deposits are derived and here is an example of a system transactions being derived and created based on the L1 block information.
The PreparePayloadAttributes will return a modified payload attribute type with extensions required for op-geth. Specifically, it will instruct geth to treat this payload as an "empty" block, but it includes a list of transactions which will be forced into the block. This is the modification required to swap out the geth mempool for the batcher.
invoking engine (execution layer)
The op-node will drive the execution engine by feeding it transactions which then produces safe (and eventually finalized blocks) which the proposer can query and submit.
When building a block, the op-node will call StartPayload (start block) which will instruct op-geth to create a block with the provided list of transactions. Note that this required a modification to geth. As noted earlier, geth is creating an "empty" block as far as it knows, but op-geth has modified it to allow for a pre-determined list of transactions to be included in the building process.
The transactions derived for an L2 block are provided to the engine via the
engine.ForkchoiceUpdate
API. This will return the PayloadID for the block being built.Once the block has been built, the op-node will call CompleteBuildingBlock. This will incur two API calls to the engine: engine.GetPayload, followed sequentially by: engine.NewPayload.
From my understanding,
newPayload
inserts a block into the eth chain without updating the safe head.Thus after calling
NewPayload
, the op-node will update the reference to chain's head and ask the engine to finalize the update via another call toengine.ForkchoiceUpdate
.Integration approaches
Based on my investigation, it appears there are three potential approaches to integrating ibc-go:
Requirements
At the time of this writing, I have not had time to look into how fraud proofs with ibc-go. Based on discussions with external parties, the current working understanding is that it should be possible to reuse the work Optimism has done to cross compile op-geth into MIPS which can then be used with the Canon contract to prove fraud proofs.
For now, it will be assumed that while potentially challenging, it should be possible to reuse the existing work.
Approach 1: Embedding ibc-go into the execution engine
In order to embed ibc-go into op-geth, two approaches could be taken:
Option (1) allows for trivial support of the existing work to reuse fraud proofs.
Option (2) may allow for easy reuse of fraud proofs, but may require modifications to ibc-go to make it cross compilation compatible (unknown assumption).
It is currently unclear to me the complexity of modifying op-geth to support hooks into ibc-go. I suspect this work to be non-trivial as we would need to ensure the provable state used by ibc-go is included into the output hash. In addition, it is likely desirable to build in bi-directional communication between smart contracts and ibc-go.
Modifying op-geth to communicate with ibc-go is very similar to the following approaches. The primary trade off it makes is being tightly coupled with op-geth. Based on my current understanding, I believe this may be a technically difficult and likely a poor abstraction, but I can understand it to be an adoption based benefit.
Approach 2: Side by side execution
The following quotes are from a write up by @AdityaSripal on the side by side execution approach:
This approach is the same as outlined in @notbdu's writeup on Virtual IBC and IBC outposts. As Bo notes:
2a: Polymer (delegated chain)
If chains are unwilling to adopt an execution engine which places ibc-go and op-geth side by side, then they may opt to delegate this responsibility to Polymer. They also may opt to delegate to Polymer if the side by side execution approach is not available yet.
2b: Single rollup
As noted above, in order for the target state machine and ibc-go to communicate, we will need to verify event emission between the two state machines to pass authenticated messages.
There are two approaches to this:
In practice, chains which delegate IBC responsibilities will need to utilize async message passing while chains which run IBC in-process can synchronously validate relay messages before committing a block.
An important aspect to note is that most execution engines (geth, ABCI) will execute the full list of transactions once rather than continually accepting transactions to build a block. We need to take this into consideration because we likely need to atomically submit an ordered list of transactions (submitted by the batcher) for each execution engine. We cannot easily execute a single transaction at a time and relay the result to the other execution engine.
Let's look at some other aspects that will come into play for the single rollup approach.
Optimistic bi-directional communication
It is likely desirable to enable IBC applications to be written natively in the target state machine (op-geth). In this case, we need to ensure that smart contracts can send authenticated messages to ibc-go and ibc-go can send authenticated messages to smart contracts.
At a high level, here is an example where we would like bi-direction communication:
In this flow, we can note a limitation. Each state machine executes all its transactions for a block once, but step (5) requires ibc-go to first execute steps 1 and 2 and op-geth to execute step 4. This would require 3 rollup blocks to be produced to perform the receive packet step (normally executes entirely within a single transaction within ibc-go)
A potential solution to this multi-block delay is to optimistically accept transactions, such as step (5), whose validity will be proved in the final step of committing the block. In situations where the batcher is acting honestly, the IBC flow will have low latency transactions. In the situation where the batcher acts maliciously, the block is marked as invalid, the same result as the batcher withholding batch submissions.
Thank you Aditya for this ingenious idea.
Post Handler Verification
Once both state machines have executed their blocks, op-geth and ibc-go, a post verification handler can be run on the ordered list of transactions for each respective block. If the list contains cross-domain relay messages, a check should be performed to ensure that the prerequisite event was emitted on the necessary state machine. This post verification only needs to be performed for the (2) synchronous relay message approach.
Using the example in the optimistic bi-directional communication section, the following checks would need to be performed before the op-geth and ibc-go blocks can be finalized:
If any of the post verification steps fails, the blocks produced must be aborted and an empty invalid block must be produced in its place.
Note: This behavior must be verified. I am unsure what op-stack currently does if the batcher submits an invalid batch. My assumption is that the first block produced in an epoch should only contain deposits and system transactions, that is non-batcher submitted transactions, ensuring that batcher misbehavior does not affect critical components of the rollup guarantees.
Note: events emitted in SDK/CometBFT chains are currently non-deterministic. We would need to ensure we are emitting provable events (ie consistent across all nodes/patch releases)
Cross Domain Messenger
Communication with smart contracts may occur by utilizing the cross-chain domain messenger. All
sendMessage
transactions which come from ibc-go must be validated by proving the corresponding event was emitted. Once the message is "sent", a user can submit arelayMessage
which will pass the message to the appropriate contract.Approach 3: Pre Engine Invocation Execution
Another approach is to execute ibc-go logic before the execution engine is invoked.
To achieve this, derivation code would be added to op-node/rollup/derive which parses an ibc-go transaction from the batcher submitted batch. The ibc-go transaction would be passed to ibc-go via a RPC call. The response to this RPC call would be translated into an op-geth transaction which would use a predeploy IBC contract to write state into a
provableStore
.This is very similar to the approaches listed above, except that it makes modifications into the derivation code. The primary benefit of this approach is that it mainly requires modifications to the derivation layer. One downside is that you would be performing ibc-go fraud proofs in the derivation code. In addition to this, logically it is not natural to perform executions at the derivation layer whose primary purpose is uncompressing transactions submitted by the batcher and reading transactions submitted via DA (data availability layer). Another downside is that it requires a special ibc-go transaction for each execution engine (though for the moment op-geth is the primary execution engine).
Thank you Hamdi for noting this clever approach.
Considerations
We should have high confidence in the ability to restructure our approach to OP Stack Integration. It is better to have an insufficient solution that can be modified over time, then to decide on an unmovable solution. We should be able to rework, what components do what without fracturing ics20 liquidity. We should also be able to reuse existing work when updating to newer solutions.
As noted by Aditya:
This will be partially achievable via channel upgradability which will allow changing of connections associated with a channel.
Conclusion
Given the above information, I believe the (2) approach of creating an environment where multiple execution engines can be supported within a single execution layer (and represented under a single output hash) creates for a sound technical abstraction that can likely be extended to other frameworks as well. This also happens to align with Polymers conclusion as outlined in the blog post Exploring Virtual Blockchain Design written by @notbdu. Bo does an excellent job visualizing the application/server abstraction and analogizing it to web2 design, showing us that this is a design model which can easily extend into other ecosystems.
In addition, we will want to take inspiration from the (1) approach which would support implementation of IBC components in other execution engines.
To achieve this, we will need to define standards for facilitating bi-directional communication between execution engines. Specifically how we expect asynchronous (messages sent in past rollup blocks) relay messages and synchronous (messages sent in the same rollup block) relay messages to be authenticated (proved). In addition we should define the endpoints which these relay messages interact to facilitate the next IBC processing step.
To create a multi-execution engine environment, we can start by implementing our own execution engine which embeds op-geth (taking care to handle the single output hash elegantly). I expect it to be possible to add additional abstractions to the op-node which would allow for a standard way to extend the execution layer with multiple execution engines.