ethereum-optimism / optimistic-specs

Optimistic: Bedrock, is a protocol that strives to be an extremely simple optimistic rollup that maintains 1:1 compatibility with Ethereum
MIT License
168 stars 36 forks source link

Support calldata compression #10

Closed karlfloersch closed 2 years ago

karlfloersch commented 3 years ago

Background We can get significant calldata savings by supporting compression. This can be as simple as zero byte compression plus a lookup table. We should make it an objective to be able to cleanly fit this into the architecture.

smartcontracts commented 3 years ago

Yea this is crucial. We can pretty much use any compression algorithm as long as we can implement it on-chain.

karlfloersch commented 2 years ago

We should run numbers on how much simple stateless compression could buy us & consider adding it to block derivation if it's not expensive & saves a bunch of calldata.

tynes commented 2 years ago

Some related work:

karlfloersch commented 2 years ago

I think applying a very simple compression algorithm to all transactions could be huge & not impose a lot of cost on the protocol. We absolutely need to do some benchmarking. Vitalik has even written a simple compression algo & also has suggested using Snappy compression. We should dedicate a couple days to determining the performance characteristics of various compression algorithms & their savings in calldata size.

Here's a list of considerations that I've been thinking of as we explore compression:

  1. Adding compression makes tx fees a little more challenging. The l1 gas cost of a transaction goes from being trivial to compute (based on the size of the tx and the number of zero bytes), to being a bit tricker to compute (you may have to attempt to compress the tx to determine the l1 gas cost.
  2. Are there ways to create transactions which are super difficult to compress or decompress? We definitely need to audit the compression algorithms for these sorts of properties.
  3. Is compression better to add at a 2nd layer? Eg. https://github.com/vyperlang/vyper/issues/2542 . -- My guess for this is that complex compression is better to add at a 2nd layer, but adding simple compression would be huge for tx costs (at least in the near term considering how terribly inefficient ABI is).
norswap commented 2 years ago

@karlfloersch

  1. Yes, we should try to wargame if we can charge based on uncompressed calldata, or if adversarial input could be a significant DoS/griefing factor. Intuitively, it seems hard to avoid that it will be possible to have data that doesn't compress at all. So if we manage 50% compression, this means there is a 2x calldata cost amplification attack. We should absolutely avoid cases where the "compression" could be bigger than the original (I expect sensible algorithms to behave like this, but we should check).
  2. Afaik, most popular compression algorithms are O(n) (which makes sense are they're often used to compress gigantic corpuses). But yes we should check.
  3. Guesses seconded. This should also be benchmarked.

Some more notes:

protolambda commented 2 years ago

Regarding snappy compression: eth2 uses it in different places, and there are some fun edge-cases / security concerns!

Some of these may also apply to other compression methods, don't trust any method to do the right thing by default.

And intuitively I think we want to optimize for monotonous data (32 bytes copy instructions pointing to previous decompressed data, maybe special handling of 0x00 and 0xff), and maybe not prioritize run-length-encoding like some compression algorithms do. Curious to see some stats of different compression techniques on L1 transaction history!

vbuterin commented 2 years ago

Regarding adversarial input, one thing that should definitely be done at some point is charging gas costs based on their contribution to compressed size, and not based on their uncompressed size. This way adversarial input would not break anything economically.

If we use my zero byte compression algo, then this is easy, because you can just directly reduce the gas cost for sequences of zero bytes. For snappy you can't quite do that. But if the sequencer is trusted to set the fee, the sequencer could do something else that's pretty simple: track the total length of the compressed block while building it up and charge each tx for its marginal contribution to the compressed block size.

norswap commented 2 years ago

@protolambda @vbuterin Is there some rationale for choosing Snappy for the consensus layer? I assume its high throughput must be a big part of that?

In the case of calldata compression, we probably want to optimize for compression ratio. In the benchmarks, even the slowest algorithms hover at ~1MB/s compression throughput, and that seems plenty for the foreseeable future (not to mention there are algorithms with great compression ratios with up to two order of magnitude more throughput). We'll need to confirm how these numbers hold up with actual call data.

@vbuterin Isn't charging marginal contribution to block size somewhat unfair to people whose transaction are placed at the start of the "compressed unit" (a batch?), where opportunity for compression is small?

It also means any kind of fee estimation has to be run through the sequencer, or at least a node that tracks the sequencer closely. This should be doable. A good property of charging marginal block size contribution is that this is a number that is expected to only go down as the size of the compressed unit grows, so errors in estimation are only overestimations, which are safe. However that only holds true if your transaction lands in the compressed unit that was ongoing at the time of estimation, and not at the start of a fresh one.

Speaking of "compressed unit", a batch of transactions is a natural fit, but maybe it sacrifices compression opportunity by virtue of being too small. We could consider compressing accross multiple batches, up to a certain maximal size. This requires a compression algorithm that is "streaming", i.e. does not need to pass over the whole input before starting to emit the compressed stream.

It might even be possible to do "rolling window compression". For instance Snappy compresses by referencing segments of the decompressed stream. We could constrain the compression algorithm to only consider the last X bytes of the decompressed stream.

Advantages of rolling window compression:

Disadvantages:


Other random remarks:

vbuterin commented 2 years ago

@vbuterin Isn't charging marginal contribution to block size somewhat unfair to people whose transaction are placed at the start of the "compressed unit" (a batch?), where opportunity for compression is small?

My honest answer to this is "meh, whatever".

Charging gas fairly for compressed data is a much harder problem than just compressing data, and even the most unfair algorithm leaves pretty much all users paying significantly less than today. So it's a bad idea to let that concern get in the way of getting something quick but meaningful out there.

I should also add that zero-byte compression is very easy to charge gas for because it has a fixed formula.