feat: add entrypoints.md

Description

Continues https://github.com/ethereum-optimism/design-docs/pull/155

This design document introduces Entrypoint contracts as a new primitive that allows anyone to add custom logic on top of the L2ToL2CrossDomainMessenger. It generalizes the L2ToL2CrossDomainMessengerdesign and unlocks other interop primitives such as message batching and expiring. To do so, it adds a parameter in the L2ToL2CrossDomainMessenger that binds the relaying to a particular address, where custom logic can live.

Regarding expiring messages: ethereum-optimism/specs#460

Perfect. I feel like these are two separate types of problems.

The one presented in this PR is about messages that fail when relayMessage gets called.
The one you mention is for messages that can't even reach that stage. Both are important to address. We might find it convenient to use different names.

Hi guys, I'm Sergey, research scientist at Lisk. I met Skeletor on Devcon and he directed me to these specs since I was very curious about making OP Interop more user friendly.

I think this design is a step in the correct direction of making Interop more usable by actual applications. However I have a couple of more practical and some more abstract questions about this design. Maybe I will start with very particular ones:

I am curious how the swap example will work with the actual SuperchianTokenBridge? This goes in the direction of the question above by @mds1. Namely, which messages (if any) will be emitted by the bridge itself? Will the bridge smart contract have to be modified to know about the entrypoint?
How could trustless incentivized relaying work with entrypoints? (I have message m and tokens t on chain A, I want m to be relayed to chain B for the fee of t) Would all relayers use the same entrypoint? How many transactions / messages will have to be sent on chain A and on chain B?

In general, I see that entrypoints enable all the nice usecases but I am not convinced that sending several messages that go through the entrypoint and also target the entrypoint is the simplest design. However maybe we cannot do much better because sending tokens and doing an action (swap) has to happen in two different cross-chain messages...

Hi guys, I'm Sergey, research scientist at Lisk. I met Skeletor on Devcon and he directed me to these specs since I was very curious about making OP Interop more user friendly.

I think this design is a step in the correct direction of making Interop more usable by actual applications. However I have a couple of more practical and some more abstract questions about this design. Maybe I will start with very particular ones:

I am curious how the swap example will work with the actual SuperchianTokenBridge? This goes in the direction of the question above by @mds1. Namely, which messages (if any) will be emitted by the bridge itself? Will the bridge smart contract have to be modified to know about the entrypoint?

How could trustless incentivized relaying work with entrypoints? (I have message m and tokens t on chain A, I want m to be relayed to chain B for the fee of t) Would all relayers use the same entrypoint? How many transactions / messages will have to be sent on chain A and on chain B?

In general, I see that entrypoints enable all the nice usecases but I am not convinced that sending several messages that go through the entrypoint and also target the entrypoint is the simplest design. However maybe we cannot do much better because sending tokens and doing an action (swap) has to happen in two different cross-chain messages...

Hey Segey! Thank you for taking the time to read through this design doc.

I realized the example could have been better. I just pushed a large refactor of the design doc, and it should be clearer now. I removed the swap example altogether and focused on specific use cases and the expired messages flow (which you can check in detail here).
This is a very cool point. I imagine "paymasters" would use an entrypoint where they explicitly deduct funds from the user before or after relaying the message. The user would need to grant an approval to this contract on the destination chain. Multiple paymasters can share the same entrypoint. The entrypoint here would just add fee-logic, without requiring for the user to change the target or set the amounts on the origin chain.

I agree that using the entrypoint as a target is overkill. I recommend reading the design document now; it should be much simpler.

The updated version is much clearer, especially without the swap example. I think it was a bit confusing because ultimately this logic can be easier implemented with a target smart contract and not an entrypoint.

However now I feel that entrypoint design is skewed in the direction of expiring messages a bit too much. I think the discussion on the context that you had before was important, I have an example of ordered relays to illustrate it:

In order to implement message batching / guaranteed ordered execution you need the destination chain to know the message dependencies or to which batch does the message belong. A natural solution would be for the entrypoint to check this ordering condition, but the entrypoint needs to know some additional info except for the message itself (e.g. the list of msg hashes to be relayed before this message). So this context should be passed to the entrypoint somehow.

Alternative proposal

I have an idea of an alternative entrypoint approach, but for now I struggle to see whether it solves all the problems that your design solves.

A cross-chain entrypoint is deployed both on chains A,B with the same code and the same address.
CCEntrypoint on chain A calls sendMessage with a wrapped call to the CCEntrypoint on the other side. Note that this message can contain both a context for CCEntrypoint on the other side and a wrapped calldata for some other contract on chain B (a bit like internet protocol layering). So any user cross-chain message A -> B that needs the entrypoint functionality must go via the entrypoint on chain A.
L2ToL2CDM on chain B is still the first stop for the message processing, as in the current interop design. relayMessage will unpack and make the corresponding call to CCEntrypoint on chain B.
CCEntrypoint on chain B can guarantee that the message originated from the other CCEntrypoint on chain A via crossDomainMessageSender (so some static validation can even be done only on chain A). It can also contain arbitrary validation and execution logic.

Pros of this design:

No change to current interop protocol.
Can pack entrypoint context into the cross chain message together with the original message to be executed.

Cons:

Bridging transactions could not be routed via an endpoint since SuperchainTokenBridge directly uses L2ToL2CDM to pass bridging messages. However I think your design also requires some modifications to the bridging protocol.
The cross-chain entrypoint will have to know and probably verify some details of cross chain messaging that are handled by L2ToL2CDM, e.g. how to execute a message. Introduces code duplication, somehow not very satisfying design.

Generally I am curious to hear what you think about this idea. I hope our exchange will make OP Interop as well-designed as possible!

The updated version is much clearer, especially without the swap example. I think it was a bit confusing because ultimately this logic can be easier implemented with a target smart contract and not an entrypoint.

However now I feel that entrypoint design is skewed in the direction of expiring messages a bit too much. I think the discussion on the context that you had before was important, I have an example of ordered relays to illustrate it:

In order to implement message batching / guaranteed ordered execution you need the destination chain to know the message dependencies or to which batch does the message belong. A natural solution would be for the entrypoint to check this ordering condition, but the entrypoint needs to know some additional info except for the message itself (e.g. the list of msg hashes to be relayed before this message). So this context should be passed to the entrypoint somehow.

Alternative proposal

I have an idea of an alternative entrypoint approach, but for now I struggle to see whether it solves all the problems that your design solves.

A cross-chain entrypoint is deployed both on chains A,B with the same code and the same address.

CCEntrypoint on chain A calls sendMessage with a wrapped call to the CCEntrypoint on the other side. Note that this message can contain both a context for CCEntrypoint on the other side and a wrapped calldata for some other contract on chain B (a bit like internet protocol layering). So any user cross-chain message A -> B that needs the entrypoint functionality must go via the entrypoint on chain A.

L2ToL2CDM on chain B is still the first stop for the message processing, as in the current interop design. relayMessage will unpack and make the corresponding call to CCEntrypoint on chain B.

CCEntrypoint on chain B can guarantee that the message originated from the other CCEntrypoint on chain A via crossDomainMessageSender (so some static validation can even be done only on chain A). It can also contain arbitrary validation and execution logic.

Pros of this design:

No change to current interop protocol.

Can pack entrypoint context into the cross chain message together with the original message to be executed.

Cons:

Bridging transactions could not be routed via an endpoint since SuperchainTokenBridge directly uses L2ToL2CDM to pass bridging messages. However I think your design also requires some modifications to the bridging protocol.

The cross-chain entrypoint will have to know and probably verify some details of cross chain messaging that are handled by L2ToL2CDM, e.g. how to execute a message. Introduces code duplication, somehow not very satisfying design.

Generally I am curious to hear what you think about this idea. I hope our exchange will make OP Interop as well-designed as possible!

I will add the context discussion again. I removed it because I was afraid it might introduce too many complexities, but I agree its a very important argument for entrypoints.

Regarding the Alternative Design you suggested: I like the approach and I think its a good way of achieving local checks and storage (including expired messages), and it's user-friendly since encoding and decoding are managed by the contracts.

Some things that come to mind here are:

This design doesn't inherently solve batching (message ordering) unless additional abstractions are added (like receive-and-wait mechanisms, which would require multiple transactions).
Encoding information within the message makes composability harder. Changing the processing order among multiple entrypoints would need flexible encoding methods, which aren't straightforward. This might be easier with a "permissioned" entrypoint where the message remains unaltered, and we use context to create a chain of permissions on the destination chain (forcing a specific order among entrypoints).

After considering your design, I believe the strongest argument for introducing entrypoints is enabling message bundling, i.e. enforcing a specific order for messages on the destination chain.

I will try to modify the design doc once again with this, and would love to keep this discussion open.

@sergeyshemyakov I've given your suggestion further thought, and here's my perspective:

While it's technically possible to achieve most of the use cases outlined in the design document by encoding additional information in an Initiator contract (or manually) on the origin chain and then sending messages to a dedicated Receiver contract on the destination chain for decoding and execution, this method significantly complicates the developer experience.

Advantages of Using entrypoint:

Simplified Logic Delegation: Instead of embedding complex conditions into the message or the target contract, users can delegate this logic to the entrypoint. This separation of concerns keeps the message payload and target contract focused on their core functionalities, enhancing simplicity and composability.
Improved Developer Experience: Developers don't need to handle intricate encoding and decoding processes or manage intermediate contracts that handle user funds and approvals.
Composability: By offloading additional logic to the entrypoint, it's easier to compose and reuse components across different cross-chain interactions. With encodings, this becomes harder.

For message bundles, entrypoints offer a much better user flow. While it's possible to process bundles using Receiver contracts without entrypoints, this approach requires:

Receiving each relayed message individually.
Waiting for the entire bundle to be received and stored.
Executing a final transaction to process all messages collectively (resulting in 𝑁+1 relay transactions instead of just one).

This not only adds complexity but also increases the potential for errors and delays.

Example: two `SuperchainERC20` transfers

With `entrypoint`

The entrypoint has a single function to relay that takes three messages as inputs: two messages corresponding to a crosschain transfer (a call to the SuperchainTokenBridge) and a Context event. The Context event encodes information linking the two transfers, preventing someone from passing a wrong pair. The entrypoint checks that the Context was emmited by an approved address in destination.

This approach ensures the token transfers cannot be relayed independently. The Context could have been even emitted later on, as long as each transfer referenced the entrypoint.

Without `entrypoint`

It is possible to achieve the same end result by using dedicated encodings and decodings in each chain, but the flow is quite complex and requires that these intermediate contracts handle user funds and approvals. An example encoder contract in origin would look like this

solidity
function sendTokens(
    address token1,
    address to1,
    uint256 amount1,
    uint256 chain1,
    address token2,
    address to2,
    uint256 amount2,
    uint256 chain2
) public {
    // Transfer token1 and 2 from msg.sender to this contract
    IERC20(token1).transferFrom(msg.sender, address(this), amount1),
    IERC20(token2).transferFrom(msg.sender, address(this), amount2),

    // Send token1 using SuperchainTokenBridge
    bytes32 msgHash1 = superchainTokenBridge.sendERC20(token1, receiverAddr, amount1, chain1);

    // Send token2 using SuperchainTokenBridge
    bytes32 msgHash2 = superchainTokenBridge.sendERC20(token2, receiverAddr, amount2, chain2);

    // Emit Context event with original msg.sender and msgHashes
    emit Context(msg.sender, token1, to1, amount1, token2, to2, amount2, msgHash1, msgHash2);
}

Then, in destination, each call can be relayed independently to the receiverAddr, that will need to store completed msgHash and have a dedicated function to do the final relay to the recipients.

Notice how much complex this second flow is. Entrypoints are not unblocking new features, but they are making it way easier for devs and users.

@0xParticle Thanks for having such a deep dive into my question, appreciate your comments!

So what I am taking out of this discussion:

Entrypoint design provides simpler composability and batching logic since you have control over how you call relayMessage on L2ToL2CDM. Thinking about my design with receiver contracts I realized that I imagine relayers somehow batching message execution anyway, so it is nice to provide a standard solution to this.
Entrypoints can interact with SuperchainTokenBridge much more easily, because they live in different design spaces. My idea of receiver contract lives on the same layer as the bridge in the sense that they both are built on top of L2ToL2CDM and are expected to be called directly from the messenger, thus making interactions complicated.
For the two reasons above, entrypoints are a superoir design.

Still I have two comments from my side:

I am still not completely convinced that the composability will be easily implementable for entrypoints. Imagine if you have an entrypoint that allows message batching and another entrypoint that makes sure to reward the relayer for passing a message, I don't see how you could easily get a solution to batch and incentivize without rewriting code. Although I see that entrypoints are reusable in different contexts (e.g. batching messages for DEX swaps or batching messages for voting in DAO).
The entrypoint context is paramount for almost all interesting applications, so I urge you to already embed it into this proposal by defining a context-passing event and a clear interface how to manage the context on the receiving chain.

Appendix: Further thoughts on my design

Now I am rethinking my receiver contract idea more in terms of cross-chain wallet, where a user "owns" a particular sender / receiver on several different chains and only they can move messages via the receiver (can be enforced e.g. by checking a signature on the sending side). Then it is possible to implement batching and relayer incentivization in the following way (which I have implemented here).

Receiver "cross-chain wallet" implements sequencing of messages, where every message can come with an array of msgHashes that must be successfully relayed before the message (as defined by successfulMessages on L2ToL2CDM).
User sends a transaction on chain A that a) transfers 100 USDC tokens to the "cross-chain wallet" receiver on chain B (msg1) b) sends a message to "cross-chain walllet" receiver on chain B to swap USDC tokens into OP once msg1 is processed (msg2) c) sends a message to "cross-chain walllet" receiver on chain B to deploy all but 1 OP tokens into AAVE pool once msg2 is processed (msg3) d) sends a message to "cross-chain walllet" receiver on chain B to pay 1 OP to tx.origin once msg3 is processed. Note that msgHash can be computed immediately.
Relayer sees this sequence of messages and realizes that they can relay all 4 messages to chain B for a reward of 1 OP. They batch relaying of all messages because they have a financial incentive to do so (if they relay messages by 4 different transactions, someone can relay the final message and steal the whole reward).

As an outcome, the user bridged tokens A->B, swapped, provided liquidity to staking protocol and paid the relayer all within one transaction. However it is important that user actually controls the "cross-chain wallet" receiver on chain B in the same way as they could control any other smart contract wallet.

@0xParticle Thanks for having such a deep dive into my question, appreciate your comments!

So what I am taking out of this discussion:

Entrypoint design provides simpler composability and batching logic since you have control over how you call relayMessage on L2ToL2CDM. Thinking about my design with receiver contracts I realized that I imagine relayers somehow batching message execution anyway, so it is nice to provide a standard solution to this.

Entrypoints can interact with SuperchainTokenBridge much more easily, because they live in different design spaces. My idea of receiver contract lives on the same layer as the bridge in the sense that they both are built on top of L2ToL2CDM and are expected to be called directly from the messenger, thus making interactions complicated.

For the two reasons above, entrypoints are a superoir design.

Still I have two comments from my side:

I am still not completely convinced that the composability will be easily implementable for entrypoints. Imagine if you have an entrypoint that allows message batching and another entrypoint that makes sure to reward the relayer for passing a message, I don't see how you could easily get a solution to batch and incentivize without rewriting code. Although I see that entrypoints are reusable in different contexts (e.g. batching messages for DEX swaps or batching messages for voting in DAO).

The entrypoint context is paramount for almost all interesting applications, so I urge you to already embed it into this proposal by defining a context-passing event and a clear interface how to manage the context on the receiving chain.

Appendix: Further thoughts on my design

Now I am rethinking my receiver contract idea more in terms of cross-chain wallet, where a user "owns" a particular sender / receiver on several different chains and only they can move messages via the receiver (can be enforced e.g. by checking a signature on the sending side). Then it is possible to implement batching and relayer incentivization in the following way (which I have implemented here).

Receiver "cross-chain wallet" implements sequencing of messages, where every message can come with an array of msgHashes that must be successfully relayed before the message (as defined by successfulMessages on L2ToL2CDM).

User sends a transaction on chain A that a) transfers 100 USDC tokens to the "cross-chain wallet" receiver on chain B (msg1) b) sends a message to "cross-chain walllet" receiver on chain B to swap USDC tokens into OP once msg1 is processed (msg2) c) sends a message to "cross-chain walllet" receiver on chain B to deploy all but 1 OP tokens into AAVE pool once msg2 is processed (msg3) d) sends a message to "cross-chain walllet" receiver on chain B to pay 1 OP to tx.origin once msg3 is processed. Note that msgHash can be computed immediately.

Relayer sees this sequence of messages and realizes that they can relay all 4 messages to chain B for a reward of 1 OP. They batch relaying of all messages because they have a financial incentive to do so (if they relay messages by 4 different transactions, someone can relay the final message and steal the whole reward).

As an outcome, the user bridged tokens A->B, swapped, provided liquidity to staking protocol and paid the relayer all within one transaction. However it is important that user actually controls the "cross-chain wallet" receiver on chain B in the same way as they could control any other smart contract wallet.

Entrypoint Composability

I agree that composability isn't straightforward in all cases.

In the scenario you mentioned—having an entrypoint that allows message batching and another that rewards relayers—I was thinking on using the Context event to encode a "chain of callers." Specifically, we can pass the address of the RelayerFeeEntrypoint as part of the Context to the BatchingEntrypoint. Then, the BatchingEntrypoint checks that msg.sender matches the decoded address of the RelayerFeeEntrypoint. This approach ensures that the only valid flow for relaying is:

User → RelayerFeeEntrypoint → BatchingEntrypoint → L2ToL2CDM

It would also be possible to encode it the other way around, as long as the check is in both entrypoints.

I believe this design is relatively simple and can be standardized—it's akin to a "meta-entrypoint" pattern. It allows us to stack entrypoints by having each one verify the caller based on the context, thus enabling composability without extensive code rewriting. In contrast, using Initiator/Receiver patterns would require the decoding method to be aware of the execution order

Context enshrinement

We also think that eventually, the context should be enshrined within the protocol. However, for now, we'll start with the entrypoint alone since it's upgradeable, and we want to see how these contracts are used in practice.

Enshrining the context is indeed the best way to securely bind a message to its context, but it requires the context to be known upfront at the source—it can't be emitted later or via a call to a contract that isn't context-aware. For that reason, we'll begin without enshrining the context and have the entrypoint trust a sender for the Context event.

We want to wait a bit and see how entrypoints and context are utilized before modifying the L2ToL2CrossDomainMessenger further.

Message batching

Your approach is very clean and a clever use of the successfulMessages mapping. There are two invariants we had in mind for batching that might not be possible with this approach:

Messages SHOULD NOT be aware that they are part of a batch when they're sent from the source chain.
- In particular, users SHOULD be able to create a batch from a Batcher contract that calls intermediate contracts, where each of them calls to sendMessage and is not aware that is being called by a Batcher (a call to sendERC20 in the SuperchainTokenBridge for instance).
It SHOULD be possible to execute an arbitrary subset of the batch (if some predefined conditions are met).
- As I see it, a batch is just a set of related transactions such that individual messages cannot be processed alone (without taking the other messages into account). That relationship can be used for enforced ordered execution as a particular case, but it could also happen that they have a parenthood relationship (batches inside batches) or another kind of dependancy.

We've been discussing two possible designs internally that modify the L2ToL2CrossDomainMessenger to support these functionalities (still a work in progress). You can check it here.

That said, if these two points aren't critical for your use case, your design is excellent, especially since it doesn't require any modifications to the L2ToL2CrossDomainMessenger. It provides an elegant solution within the existing framework.

We still need to gather feedback on the key invariants for batching.

Also, feel free to reach out via Telegram @parti0x :)

ethereum-optimism / design-docs