L2 block number cannot be a transaction index

ethereum-optimism / optimistic-specs

Optimistic: Bedrock, is a protocol that strives to be an extremely simple optimistic rollup that maintains 1:1 compatibility with Ethereum

MIT License

167 stars 36 forks source link

L2 block number cannot be a transaction index #296

Open saurik opened 2 years ago

saurik commented 2 years ago

So, ever since getting into running an Optimism archival node this past week (as well as seeing some discussion about the block number value on Discord), I have been thinking about the semantics of Optimism blocks, and I am now pretty certain that it is a "serious bug" in the Optimism platform that the L2 block number--as reported by the NUMBER EVM instruction (as Optimism-specific mechanisms aren't relevant)--is merely a transaction index with no block-like semantics.

I realize most people talking about blocks are using them wrong: they are talking about "regular intervals" or how they correlate with wall time. I am not making these mistakes (and in fact my involvement at first on Discord was to explain to someone why they probably didn't mean to use the block number for their purpose). Instead, the true goal of the block number is to act as a measure of "potential atomicity" between multiple transactions.

We see this usage in systems such as Uniswap2, which relies on the block number to prevent attacks on its price oracle: they always use the price from the previous block, which means an attacker that wants to fool the oracle would have to perform a bad trade with a large value and then risk all of that money due to someone else potentially arbitraging them before they can take advantage of the fake pricing they injected and atomically revering their trade. You can read more about this here:

https://uniswap.org/blog/uniswap-v2

To set the measured price to one that is out of sync with the global market price, an attacker has to make a bad trade at the end of a previous block , typically with no guarantee that they will be able to arbitrage it back in the next block. Attackers will lose money to arbitrageurs, unless they can "selfishly" mine two blocks in a row. This type of attack presents a number of challenges and has not been observed to date.

However, with Optimism, a sequencer is now in a position to perform this attack, as they have control over not merely the transactions within a single block, but the order of transactions that will execute across multiple blocks. Further, if the L2 sequencer is also an L1 miner, then they can actually mine an L1 block with multiple L2 batches with guaranteed atomicity... this means the batch number is also not a candidate for the result of NUMBER.

Instead, it seems like what you really need to do is take all batches in an L1 block and then attempt to process all of their transactions (in order, of course) as part of a single L2 block which uses a block number one larger than the previous L2 block number (as these should be dense: using the L1 block number would be inappropriate). Transactions that don't are now invalid (maybe due to not enough balance to pay for the value + gaslimit * gasbid) and can simply be thrown out, such that they could be re-applied in a later sequencing batch.

The key thing is just that all transactions executed as part of the same L1 block must have the same L2 block number, to expose to L2 the atomicity of L1. I appreciate this might have other ramifications on your design, but I feel like they should be tractable, and this platform semantics mismatch means your chain isn't actually a secure way to execute popular EVM contract protocols (such as Uniswap2).

smartcontracts commented 2 years ago

We're thinking quite heavily about this issue. It's very likely that an update within the next few months will introduce a new scheme in which L2 blocks contain more than one transaction and are produced at a rate proportional to the L1 block production rate. cc @karlfloersch is that understanding correct?

smartcontracts commented 2 years ago

Moving over to optimistic-specs for further discussion

protolambda commented 2 years ago

In the new optimism upgrade (bedrock) we're adopting regular block based bundling of L2 txs, that should fix this

maurelian commented 2 years ago

Yeah, this will be fixed.

saurik commented 2 years ago

@protolambda FWIW, I can read "we're adopting regular block based building of L2 txs" in multiple ways, one of which is correct and one of which is only half-correct. "The key thing is just that all transactions executed as part of the same L1 block must have the same L2 block number, to expose to L2 the atomicity of L1."

trianglesphere commented 2 years ago

I don't think that bedrock solves this, but I also don't think that it's a tractable problem with a single sequencer. Unless multiple parties can submit batches that get pulled into L2, I'm not sure that your proposal (read all batches from L1 into L2) solves the problem.

We do do something similar with deposits to L1 allowing arbitrary L2 computation and those must be included immediately & follow the same L1 order, but batched transactions can still be ordered arbitrarily.

saurik commented 2 years ago

@trianglesphere I mean, "there must be multiple parties able to submit batches" is clearly another invariant the system must hold to be correct from this attack, but I am under the impression you already knew that? (...or is the intention for Optimism now to be centralized permanently?) Regardless, I do not consider it my job (or even a good idea) to convince you how to do it correctly, but if you agree that Bedrock doesn't solve the problem then this issue really shouldn't be closed.

norswap commented 2 years ago

@saurik

"The key thing is just that all transactions executed as part of the same L1 block must have the same L2 block number, to expose to L2 the atomicity of L1."

Just to make sure, you are talking about deposits (L2 transactions submitted on L1), right? If so then I can confirm that Bedrock fixes this: all deposits landing on a given L1 block will appear in the same L2 block.

If you're talking about all L2 transactions in general, then I'd love more explanations, because we have multiple L2 blocks per L1 block, and the transactions accross these blocks are not atomic (nor do we want them to be, for latency's sake).

saurik commented 2 years ago

@norswap I mean "all L2 transactions in general", and already provided detailed explanation of why (along with a reference to Uniswap's documentation, etc.) as the long comment at the top of this issue, from which that quote is itself quoting from.

norswap commented 2 years ago

Alright, I must confess I'd read this a while ago and sort of forgot the finer details.

Here's me walking through it again, and essentially reaching the same conclusion as @trianglesphere and @saurik (i.e. we need sequencer decentralization to fix this).

We see this usage in systems such as Uniswap2, which relies on the block number to prevent attacks on its price oracle: they always use the price from the previous block, which means an attacker that wants to fool the oracle would have to perform a bad trade with a large value and then risk all of that money due to someone else potentially arbitraging them before they can take advantage of the fake pricing they injected and atomically revering their trade.

I will note this is theoretically feasible for large mining tools today, who could wait until they mine two blocks in a row to perform such an attack. Of course, not done in practice, probably not economically beneficial & potentially illegal.

The point about our sequencer being able to do this trivially is well taken.

Instead, it seems like what you really need to do is take all batches in an L1 block and then attempt to process all of their transactions (in order, of course) as part of a single L2 block which uses a block number one larger than the previous L2 block number (as these should be dense: using the L1 block number would be inappropriate). Transactions that don't are now invalid (maybe due to not enough balance to pay for the value + gaslimit * gasbid) and can simply be thrown out, such that they could be re-applied in a later sequencing batch.

It seems to me like the oracle attack can be performed as soon as a single party (the sequencer in our case) can mint two blocks in a row, which is hard to avoid in the rollup design.

I think what you're suggesting is that Uniswap could use "the price in the last batch as seen on L1" (by making all batches landing in a given L1 block into a single L2 block) instead of "the price in the last L2 block" to avoid this attack. Is that right?

I don't think this works however, the sequencer being free to manipulate the oracle in what gets posted to the batch, then to simply censor any uniswap transaction until he gets his exploiting transaction (+ price revert) in.

So I think @trianglesphere got this exactly right (though I didn't appreciate it when I first read his answer, I must admit) and this can't be fixed minus sequencer decentralization.

I personally don't yet have a good vision on how sequencer decentralization is going to work, I've opened this discussion to track it: https://github.com/ethereum-optimism/optimistic-specs/discussions/305

smartcontracts commented 2 years ago

Going to reopen for now

trianglesphere commented 2 years ago

"there must be multiple parties able to submit batches" is clearly another invariant the system must hold to be correct from this attack

We sidestep this by having two ways to include transactions: the first is deposits (transactions sent to an L1 contract that the rollup system watches). Sending deposits is permissionless and forces the chain to include those transactions. The other way to include a transaction is to send it to the L2 mempool/sequencer. These transactions then get submitted as batches to L1. Right now only a single party is able to submit batches. We do want to distribute sequencing, but it's not well specified at this point.

Given we're running the EVM as an L2, some of the approaches to allowing multiple parties to submit batches do not work as well (btc/utxo models work well on top of an append only log, the state model runs into more problems with ensuring that there are not knock on effects of including transactions).

or is the intention for Optimism now to be centralized permanently

This is not the intention, but it's also an optimistic system: We generally assume that actors perform their role correctly and disputing invalid actions will always have a higher latency than the happy path.

saurik commented 2 years ago

I don't think this works however, the sequencer being free to manipulate the oracle in what gets posted to the batch, then to simply censor any uniswap transaction until he gets his exploiting transaction (+ price revert) in.

So I think @trianglesphere got this exactly right (though I didn't appreciate it when I first read his answer, I must admit) and this can't be fixed minus sequencer decentralization.

@norswap Again: I consider it obvious that a system with a single centralized sequencer could never be safe to use, and so--since I was under the impression that Optimism agreed (and has merely decided to do the common thing of launching the system anyway)--I have been discounting it out of hand, and was certainly not claiming that some trivial change could fix that underlying centralization. I thereby did not claim that @trianglesphere was wrong for stating that a decentralized sequencer was required, but in fact the opposite: the overly-blunt tl;dr of my already-a-bit-blunt response (sorry... it was just really frustrating to see this get summarily closed) is not "nuh uhh" but is instead "well, duh" ;P.

The system I described where L2 is entirely driven by atomicity decisions on L1--which you have discounted "for latency's sake" (as speed apparently trumps correctness)--happens to solve both problems at once (though it causes many other domino effects on the overall design, all of which seem worth it to me if it makes the system finally function correctly? it isn't clear to me how much of the system would have to be reimplemented to go down that path, but yeah: it wouldn't surprise me if it were "nearly all of it"... this is definitely the kind of analysis I feel like one should do before bothering to type any code ;P).

We sidestep this by having two ways to include transactions: the first is deposits (transactions sent to an L1 contract that the rollup system watches). Sending deposits is permissionless and forces the chain to include those transactions.

@trianglesphere Can you explain how this solves the atomicity problem? It would seem like it almost trivially can't, as one of my attackers was an L1 miner either colluding with or acting as the sequencer (and so the L1 miner could guarantee that no such "deposits" were injected between two L2 "batches", assuming that is even really possible today as I feel like you are staring at some kind of censorship attack rather than one based on atomicity requirements of blocks).

I am pretty sure you have to simultaneously satisfy both invariants: you can't have only a single producer of blocks (the "obvious" requirement) and you also can't have the L2 blocks run faster than the L1 blocks (...unless you break away from an assumption I am making that the L1 actually has any kind of "truth" to it? if you want to just give up and run your own consensus algorithm--as I see you doing in the other thread... I wrote a really long comment about that one that I likely shouldn't post: like, I should just let you work on this now in peace, and might already have had I not been seemingly so horribly misunderstood here ;P--you probably just end up with Polygon, but I agree that if you don't actually rely on the L1's consensus algorithm you aren't beholden to it either... it just starts to not feel like an L2 anymore, though maybe you have a looser definition of what you are wanting to accomplish).

norswap commented 2 years ago

Again: I consider it obvious that a system with a single centralized sequencer could never be safe to use, and so--since I was under the impression that Optimism agreed

Actually, I think we don't agree at all!

A single sequencer system is safe — as long as a there a single honest validator that can make a fault proof.

I guess it is open to oracle attacks like the one you describe (but is that "safety"? is L1 unsafe because it would enable oracle manipulation attack if Uniswap did not check the price form the last block? is L1 unsafe because a miner that mines two blocks in a row can perform such an attack?)

Totally agree we want to make this attacks, which isn't (realistically) feasible on L1 also not feasible on L2.

A single sequencer system is also live — as long as there is a bonded proposer to submit output roots (which they can do by reading L1 inputs only, so no need to rely on the sequencer), then it's still possible to transact on L2 via deposited transactions (L2 transactions sent on L1) and to withdraw when the output roots become finalized.

Now, a single sequencer system is not ideal of course, since if the sequencer goes dark you can only transact via L1, which obviously removes any L2 scalability and cost-reduction benefits. But the system stays safe and live.

norswap commented 2 years ago

The system I described where L2 is entirely driven by atomicity decisions on L1--which you have discounted "for latency's sake" (as speed apparently trumps correctness)--happens to solve both problems at once

I don't really see what this system solves, to be honest!

To be clear, our current system is "L1-driven" in the sense that the canonical chain can be derived exclusively from L1. However, this canonical chain lags the actual/unsafe L2 chain as it is being built by the sequencer (to be clear it's only "unsafe" in the sense it hasn't been included on L1, so should not be relied on by economic bridges for instance!)

What I understand from your proposal (and tell me if I'm getting this wrong) is that you want the sequencer to start building a new L2 block only after the previous L2 block (which in this case would be isomorphic to a batch) has landed on L1. This means exposing state changes resulting from all the transactions in this L2 block atomically only after posting the block to L1.

The problem is indeed latency. This means we can't have better than 12 second L2 blocks, and we can't even guarantee that a batch will land on L1 at every block, so worse on average. L1 has too high latency for some use cases we want to enable, and in general better latency is better (more accurate prices, games, ....). Let's also mention the terrible consequences on tx prices (we currently submit a batch approximately every 5 minutes, though this could get more frequent in the future, as usage increases).

In fact, I will argue that this design is absolutely the design from hell. Because it would be easy in practice to censor the batches, leading to the L2 blocktime becoming ginormous. This would in turn cause a massive amount of drift between L2 prices and L1 prices. This drift causes arbitrage opportunities to pop up. In turn, these arbitrage opportunities incentivize ALL L1 miners/block proposers to censor L2 batches by default, in order to create these arbitrage opportunities which they will later be able to benefit from. For instance, if they use Flashbots (as the vast majority of them do), they just benefit as MEV searchers bid against each other in order to exploit the L2 arbitrages. This would degrade the L2 experience to the point of making it unusable. Not to mention that with huge blocks, all your swaps would fail because of slippage. Not to mention that huge blocks might not be supported in geth. etc.

On the flipside, I don't see what the design fixes? It doesn't fix the oracle manipulation: you need to ensure a single sequencer can't create two blocks in a row for that. It could be argued that if deposits are included from the same L1 block where a batch/L2 block is submitted, someone else could arb, but this can be prevented by the sequencer submitting a flashbot bundle containing the batch and a deposit doing the exploit.

It also doesn't fix safety or correctness, since as argued above, there is nothing to fix there!

norswap commented 2 years ago

Also (and I'm sorry for the spam!), it's pretty much not recommended to use the Uniswap price as an Oracle anyway, see here: https://shouldiusespotpriceasmyoracle.com/

True, you can't do atomic attacks if Uniswap reports last block's price as oracle. However, here's an attack you can perform regardless:

pick an asset that satisfy the following conditions:
- has only a single liquidity pool with any significant liquidity in it
- can be used as a loan collateral
- pool characteristic and collateral ratio enable the attack (in practice, probably means that the liquidity in the pool can't be too high either, the exponential curvature of AMM curves at the edges will do the rest)
manipulate the price in block 1
borrow against half of the assets at the inflated price in block 2
partly reverse your price manipulation by selling half of the asset in block 2
repay your loan, the collateral should now be worth less than your cost basis for the repaid assets (cost of acquiring all of them minus amount recovered by selling half), this difference is the profit of the attack
(it might even be beneficial to borrow against all the assets if [amount loaned > cost basis] and just wait to be liquiddated when the price comes down)