ethereum-optimism / optimistic-specs

Optimistic: Bedrock, is a protocol that strives to be an extremely simple optimistic rollup that maintains 1:1 compatibility with Ethereum
MIT License
167 stars 36 forks source link

Safely generating L2 blocks based purely on L1 blockdata #14

Closed ben-chain closed 2 years ago

ben-chain commented 2 years ago

// see also discussion here*

Current State of L2s

In general, the nature of a rollup is such that the L2 state can be entirely defined by the L1 blocks. That is, all possible L2s can be expressed by state transitions taking the form f(old_l2_state, new_l1_block) -> new_l2_state.

In reality, most (all?) rollups today do not act on raw blockdata, but instead use an L1 smart contract to store a hash of the calldata sent to a system contract during L1 block execution. This function lived in OVM_CanonicalTransactionChain for our v1, for example. Here's where Arbitrum does it as well.

The Ideal

Our goal of EVM equivalence should make the idea of acting on raw L1 block data more feasible. In reality, a committment to input calldata is already stored within the L1 blockhash itself, and we could use this as the verification source during the fraud proof. This has been proposed by @lightclient and others. It would give us optimal gas costs, and also set us up for ETH2p1, in which the place that calldata is submitted to will not have any smart contract capabilities.

Problem Statement

The above post proves that it's feasible to give the L2 virtual machine access to some get_l1_blockhash(number) function. However, it does not tell us how we can safely use it. Geohot has proposed one way to use this blockhash in a very generalized form: give the L2 VM a function, effectively a "get_known_preimage(bytes32 hash)", which allows it to recover the preimage of a hash into memory. The L2 VM could use this to unpack the block body from the block hash, then unpack the transaction root's children from that, and so on, until it unpacks the relevant transaction data itself.

However, there is a problem: the preimages, even if known off-chain, may be too big to do any verification on-chain. For example, imagine that a block gaslimit was entirely filled with a massive, single transaction (identified by txhash) to a dummy account. It should be clear that there is no way to verify the get_known_preimage(txhash) step on-chain, because there's not enough gas. How should we deal with this?

Solution 1: Upper bound on preimage sizes

The most obvious solution to this is just to enforce a maximum preimage size that can be successfully loaded into memory by the L2 VM. This would mean that get_known_preimage returns null_bytes32 if the real preimage.length > UPPER_BOUND.

To detect this on-chain, an honest party would just allow their challenger to run out of time on the dispute game because it is impossible to prove anything else. This is the easiest to implement, but it has the downside that the fraud proofs can no longer validate arbitrary L1 blocks -- only those below some size constraint.

Solution 2: Secondary game on preimage chunks

The keccak algorithm itself can be broken down into smaller individual steps which each ingest a word of the input into a constant-sized state. So another solution is to have the two parties submit the preimage they claim is correct in smaller chunks as calldata. Once all are submitted, then they can play a bisection game on the hash substeps to prove that their preimage will result in the requested hash. This adds complexity, but does allow us to validate arbitrary L1 blocks in their entirety.

karlfloersch commented 2 years ago

@ben-chain dope writeup!

Is there a really simple way to explain why the keccak in Solution 2 has to be a sub-game as opposed to being a part of the normal execution trace? I just don't feel like I have really good intuitions there. Here's my best shot:

It is because of what @protolambda calls "state poisoning" - this happens when state is loaded into the machine without the ability to disprove its validity. The reason why you cannot disprove the first chunk's state access is because the validity of that state access is actually dependent on the final hash that is computed. Then you can retroactively say "that was an invalid chunk".

The part of this that I feel hazy about is whether it is helpful to determine where you disagree in the hash chunking process, and then afterwards post all the data.

Random ideas

karlfloersch commented 2 years ago

@ben-chain can we write this in a way such that it is non-interactive? IMO that would simplify it conceptually a lot. Non-interactive but expensive. Even though it sounds somewhat nightmarish to do, it would be super isolated & seems relatively easy to test.

ben-chain commented 2 years ago

It is because of what @protolambda calls "state poisoning" - this happens when state is loaded into the machine without the ability to disprove its validity.

This is the right intuition, yes.

The problem with "trying to find out where you disagree" is that an attacker will always try to disagree at the first chunk, devolving the game into rolling up the entire thing.

maurelian commented 2 years ago

It's super important that we're not limited by calldata size.

@karlfloersch this statement isn't obviously true to me, but perhaps I'm missing something?

I can't think of many on-chain applications that in practice would depend on being able to receive 30MM gas worth of call data. Until recently they were all making do with more like 13MM on L1 anyways.

IMO, implementing an on-chain sponge function (as fun as it sounds!) is a major endeavour in itself, if it is essential to the mission of creating an EVM equivalent ORU then we must do it. But if it results in something like a 20% to 30% increase in maximum TxData size, I'm less certain of it's value.

karlfloersch commented 2 years ago

this statement isn't obviously true to me, but perhaps I'm missing something?

It's less about the feature and more about protocol complexity. We have the choice of highly encapsulated complexity (implement a pre-image verifier for any size preimage) vs leaky complexity across multiple parts of our protocol. We need to add logic in Geth to reject txs over a specific size, make preimage oracle to return nil for large preimages, and add a special step in our dispute game to handle resolutions which timeout. Our last codebase we suffered death by 1,000 cut corners that bloated protocol complexity beyond belief.

Some other places where I'm afraid of complexity popping up:

  1. It becomes difficult to detect whether a deposit can actually be processed by the L2. Ben mentioned this before but you basically emit an enormous amount of zero bytes as an event and then deposit. If the receipt becomes too large then the fraud proof will skip the deposit but the contract has no way to know that it was skipped, so we gotta add some extra logic into our deposit watcher. This also might have other downstream effects.
  2. It is the only fraud proof step where a timeout actually produces a value. That means that theoretically you could have a preimage that is small, let the game timeout, and then it'll be "validly" interpreted as nil. This is a weird property and may be able to be used in tricky ways.

TLDR: I agree that supporting large txs is not a feature that users are clamoring for; however I think using this seemingly simple timeout approach will actually introduce gnarly complexity. + ofc EVM equivalence.

lightclient commented 2 years ago

Okay thanks for humoring me again on the call. I think I understand the interactive game over the commitment chain now. So if my understanding is correct, I can cause an otherwise valid chain to rollback at any point during the fraud proof period:

Suppose I submit the rollup transition A -> B with some input I. This is unequivocally a valid transition. At some later point, I initialize a fraud proof. I claim A -> B occurred with input I' and I attest to a sequence of computation C that shows I' is input. Of course this is impossible because I is actually the input. However, if I in my proposer role agree with challenger me that C proves I' is the input, then the transition A -> B is fraud proved in the context of I'. This would almost certainly cause a rollback to A.

It seems likely that this could be used to double spend since most people won't wait for rollup finality, only L1 finality.

ben-chain commented 2 years ago

So there are two things I would say here:

  1. We should not allow creating an assertion that has an invalid input. In particular, we should enforce that the input to the execution trace can only be a valid L1 blockhash, e.g. the result of BLOCKHASH at the time the assertion was made. So long as the input is a valid L1 blockhash, the preimage oracle allows us to derive all other data (which would include implicit "inputs", e.g. actual L1 calldata) from this starting point.
  2. The actions of proposers and challengers should be ignored when discussing the VM -- so long as there is a deterministic trace based on the input, we should assume the dispute game agents acting honestly can take care of the rest.
lightclient commented 2 years ago

Ah I forgot that each interaction needs to disputable. I think in that case my example doesn't work, because one honest party could dispute my fraudulent fraud proof.

ben-chain commented 2 years ago

It becomes difficult to detect whether a deposit can actually be processed by the L2. Ben mentioned this before but you basically emit an enormous amount of zero bytes as an event and then deposit. If the receipt becomes too large then the fraud proof will skip the deposit but the contract has no way to know that it was skipped, so we gotta add some extra logic into our deposit watcher. This also might have other downstream effects.

Yep, IMO this is the worst part of the complexity -- I think fundamentally this forces us to choose between:

  1. Having two L1->L2 pathways:
    • a trustedReceiptEnqueue which must only be called when users know the transaction receipt will be bounded in size (i.e. no arbitrary events are possible from untrusted contracts)
    • an untrustedStoreEnqueue which does 2 SSTORES (one to bump a nonce, another to store hash(messageData)) and a standalone preimage indexer which allows us to call getPreimage(messageHash)
  2. Losing the efficiency and always requiring 2 SSTOREs per L1->L2 message.

With unlimited preimage sizes, you might also imagine a world in which L2 contracts can pay to "subscribe" to all L1 events of a given address/topic. I think this is fundamentally impossible without that.

ben-chain commented 2 years ago

Update on the above: @karlfloersch pointed out that we can do untrustedStoreEnqueue in 1 SLOAD+SSTORE, instead of two SSTOREs, by maintaining a hash onion of all processed deposit data instead of an array. At that point, it's probably not worth the protocol complexity to have two pathways, and instead just do the untrusted store all the time. At 5k gas, we may have to impose an artificial cost to rate-limit anyway.

karlfloersch commented 2 years ago

This decision changes the implementation of the deposit feed. If we can load large preimages, then the deposit feed is a contract which just emits events. If we cannot load large preimages, then the deposit feed much enforce the size of the deposit data is small and store the hash of the deposit.

norswap commented 2 years ago

Trying to reformulate the whole thing, and adding some new insights at the end, based on semi-recent team conversations, but adding a new way to operationalize it that is all new.


In the new design, we run the fraud proof on a run of Geth that validates an epoch worth of blocks. The fraud proof consists of a dispute game where both parties converge on a single step whose execution they disagree on. At this stage, it is necessary to seed the preimage oracle with the preimages of any data accessed by the contested instruction. These data could be:

  1. Geth program data, proved against a merkleized execution state.
  2. L1 header properties, proven against a L1 block hash
  3. Part of the calldata, proven against a L1 block hash
  4. a deposit, proven against the receipt hash (itself proven against a L1 block hash)

We would compile Geth to MIPS (or some similar minimal instruction set). Above, (1) is a MIPS instruction, while (2), (3) and (4) map to "special" instructions (the "preimage oracle") to access the fraud proof program's input (data in some L1 block).

The trouble is case (4), because an entry in the receipt trie comprises all log entries for the transaction. A contract that submits a deposit could first emit a huge log filled with 0, which would exceed what can be held in a block's worth of calldata.

(Side note, we can't store deposits in calldata because we want to let contracts submit deposits, and only EOA can produce "top-level" transactions.)

As Karl says just above, a possibility is simply to check the size of deposits and store their hash.

The other option is to somehow enable to submit the proven receipt trie entry in multiple stages. This is what all the talk about decomposing the hash function is about.

EDIT: The below does not work, see @protolambda comment and my follow-up comment with a method that works.

~The good news (and this is the new insight) is that this could be fairly easy.~

~Instead of having a "special instruction" that fetches the deposit directly, we could implement the proving logic for the deposit in our modified Geth directly and instead the special instructions are used to fetch the pre-seeded proof inputs (i.e. the multiple chunks of the receipt entry + the rest of the merkle proof + the deposit being proven).~

~Keccak is most definitely a hash function where the input can be supplied incrementally. So we would first compute the hash of the entry (after verifying it contains the deposit), then proceed to validate the rest of the fraud proof.~

~Because of the magic of the dispute game, we won't actually need to perform most of these steps. In fact, at worst, both parties will be forced to disagree on a single chunk of the receipt entry.~

~And we will be able to reuse a Go implementation of Keccak instead of implementing it onchain (though it must be said, it's surpisingly not quite that bad).~

protolambda commented 2 years ago

The problem is not the computation, but the authentication, of input data. There is no way to proof the input to the hash-function is correct unless you can verify it against the output in the same step of the dispute. Rollups work because the inputs are all already authenticated: the hash function incrementally computes the output over trusted memory. But when loading new pre-images, the input is not authenticated, and has to be consumed in one go to establish that it's the correct input to use. See "state-poisoning" for earlier thoughts on this.

To solve the computation you can indeed do a rollup-like multi-step computation of the keccak hash, and dispute steps in-between, but the problem is that the input data itself is never authenticated. Determining the contents of the very first step, when you only have the final output hash to trust the input, is not possible.

You could try reverse the execution of the steps, by presenting the last chunk of data and checking it matches the output when combined with the internal hashing state, and then working your way back all the chunks. But that's not secure in this case, since keccak is not following a Merkle–Damgård construction like sha256.

norswap commented 2 years ago

Figured out the same over the weekend.

Essentially the issue is that if that chunk need to be proved upstream of the fraud proof (while seeding the preimage oracle) in case they disagree on a chunk.

(Note that the term "preimage oracle" is really awkward here.)

Given this, the protocol needs to prove the whole receipt entry. Here is a protocol of how this could be done while seeding the preimage oracle:

  1. User (challenger or prover) proves the hash of the receipt entry (this is done usual the usual method for verifying, edited so that the hash of the value can be supplied in place of the value). Let's call this proveLargePreimageHash.
  2. The user sends all the chunks of the preimage except the last one, one at a time. Let's call this largePreimageHashUpdate. All (or almost all) hashing algorithm support feeding data into the hash progressively (e.g. here the Java interface, check the update and digest method).
  3. Send the last chunk of the preimage and compute the hash, verify it against the hash submitted in (1). Let's call this largePreimageHashFinalize.

This logic only proves a receipts entry, additional logic is required to parse the entry and extract logs matching deposits. This can be done in solidity while seeding or in geth directly (the difference would be wether the "special instructions" fetch a whole receipt entry, or just a deposit).

protolambda commented 2 years ago

Running all the computation on-chain works, but it's very costly (a full L1 block or more!), so let's consider the alternatives as well.


One other approach is to do a fraud-proof over the thing that generated the large pre-image (L1 execution) at the same time (execute the L1 source block before executing the derived L2 block(s)). Then keep the receipt (or whatever large input) in the authenticated memory, to avoid using the pre-image oracle for it later.

The pros:

The cons:


Or another alternative is to merkleize receipts in some better way on L1, and see if it can be adopted during the Merge (transactions are already going to be merkleized nicely, and will be accessible through a binary merkle proof via the beacon-block hash-tree-root)

norswap commented 2 years ago

Been thinking about this over the weekend, finally took the time to write it out. I think this works, but I'd appreciate a review in case I missed something.

The Issue

When executing the single execution step during a fault proof, it is necessary to prove the input to the execution step. This could include:

  1. Part of a transaction from a sequencer block (posted by the sequencer as batch calldata).
  2. Part of transaction from a deposit block (emitted as an event by the deposit contract).
  3. Part of the EVM state (e.g. balances, storage slots).
  4. Part of the fault proof program state (i.e. program memory).

The problem occurs in (2): all log entries generated by a transaction are concatenated before being hashed for inclusion in the receipts Merkle tree. A malicious actor could emit huge log entries before making a deposit, hence resulting in a hash pre-image that is too large to post on-chain as calldata.

See here for some numbers, a byte of log data costs 8 gas, meaning it could result in ~3.75MB of data. Assuming none of these bytes are 0, this would take two full blocks worth of calldata to post on chain. This is much too large/expensive.

Note that this problem does not occur in (1), since we can limit the batch size at the protocol level (define larger batches as invalid), nor in (3) as slots are only 256 bits, nor in (4) as we are free to define how we Merkleize program state.

The Nice Solution

First, we can prove the hash of receipt Merkle tree entry using the usual Merkle proof mechanism but cut short to not prove the data, but only its hash.

The problem is then: how to prove a chunk of the hashed input without posting the whole input on-chain.

The Keccak (SHA-3) hashing algorithm can be decomposed as the repeated application of a function (let's call it f) over the constant-size internal state and a chunk of input.

Let's call the input (the receipt entry) I and let's slice it into blocks whose size is a multiple of the SHA-3 block size (1088 bits): I0, I1, ..., In (we'll probably need to allow the last block to be up to twice as big and run f twice for it).

Then the hash can be represented by:

I -[SHA3]-> H

(i.e. SHA3(I) = H), or by:

S0 -[f]-> S1 -[f]-> ... Sn -[f]-> H
I0 /      I1 /          In /

i.e. f(S0, I0) = S1 (etc) where S1, ..., Sn are the intermediate internal states of SHA3 and S0 is the initial state.

The two participants they can post all intermediate states they think leads to the hash. If they agree, any single chunk of input Ix can be proven by verifying f(Sx, Ix) == Ix+1 on chain.

"All internal states" might be too big in practice, but they can post a commitment to them instead. Then, they can play a bisection game on this commitment to find the last intermediate state they agree on.

If the participants disagree on some intermediate state, we take Sy the last intermediate state they agree in (or H if they don't agree on Sn). Then both participants have to provide Sy-1 and Iy-1 such that f(Sy-1, Iy-1) == Sy can be verified by running f on chain.

If f has preimage resistance (which I think is the case but we should get a cryptography expert to look over this), then there is no way for the participants to find different values of the (Sy-1 , Iy-1) pair that satisfy the equation, hence one of the two parties must fail the check (or fail to answer) and be slashed as a result.

protolambda commented 2 years ago

@norswap Regarding the nice solution, splitting up the hash function: Some hash-functions have this feature natively: Merkle–Damgård construction.

Unfortunately I don't believe SHA-3 is one of those (iirc @ben-chain asked a friend to confirm when we discussed this option in November)

norswap commented 2 years ago

Yes, it uses the sponge construction. Reading about it right now, it seems clear it's not a simple repeated application of a function, but it almost is (the key difference is that there are two functions, one for the "absorption phase" and one for the "squeeze out phase", where the squeeze function does not take further input and does not xor).

It's not quite clear to me that the proposed scheme can't work. In fact, I'd say it has to work, but the concern is the degree of attack resistance: i.e. if you can reverse/find collisions for every step (function application), then you can find a collision for the whole hash, which is obviously not possible at present. But maybe it is too expensive to find a full hash collision, but cheap enough to find a collision for a single step.

protolambda commented 2 years ago

Unlike other rollup execution (repeated function application), this is not on top of prior authenticated data. We only have the output, and repeat the steps by showing the supposed pre-state and applying the function to see if it matches the output we already authenticated.

Finding collisions on an intermediate input can be a lot easier, depending on the hash function design. Since part of the security of the full hashing may be derived from the output being unique to a certain starting state of the hash function, which we don't have to show if we only present a collision for an intermediate step that connects with the expected output.

norswap commented 2 years ago

Yup, that's exactly what I meant! Remains to figure out what the actual security here is.

norswap commented 2 years ago

Okay, so I spent some time thinking about this, and I've convinced myself it isn't possible.

The reason is actually pretty interesting. Turns out what I called f (warning: it's not what the litterature calls f) is reversible. This is actually great: if you agree on a suffit of all internal states, you can always prove the input that you disagree upon. The problem is that if you don't agree on the last internal state, you cannot retrieve (or even commit to) that state from the hash, as it is essentially truncated from it (and so finding collisions is trivial).