Support calldata compression

karlfloersch commented 3 years ago

Background We can get significant calldata savings by supporting compression. This can be as simple as zero byte compression plus a lookup table. We should make it an objective to be able to cleanly fit this into the architecture.

smartcontracts commented 3 years ago

Yea this is crucial. We can pretty much use any compression algorithm as long as we can implement it on-chain.

karlfloersch commented 2 years ago

We should run numbers on how much simple stateless compression could buy us & consider adding it to block derivation if it's not expensive & saves a bunch of calldata.

tynes commented 2 years ago

Some related work:

Calldata compression benchmarks using brotli. This looks promising
- https://github.com/rugpullindex/calldata-compression
- Increasing the txMaxSize so that the batch submitter can submit larger txs. This is more important for zk rollups, but still helpful for the throughput of an optimistic rollup
- https://github.com/ethereum/go-ethereum/issues/23920
Further decreasing the cost of calldata by altering eip 2028
- https://twitter.com/GuthL/status/1462658814226882565
- https://eips.ethereum.org/EIPS/eip-2028
Compressed uint types
- https://ethereum-magicians.org/t/eip-3772-compressed-integers-gas-cost-improvement-for-defi-smart-contracts/7123

karlfloersch commented 2 years ago

I think applying a very simple compression algorithm to all transactions could be huge & not impose a lot of cost on the protocol. We absolutely need to do some benchmarking. Vitalik has even written a simple compression algo & also has suggested using Snappy compression. We should dedicate a couple days to determining the performance characteristics of various compression algorithms & their savings in calldata size.

Here's a list of considerations that I've been thinking of as we explore compression:

Adding compression makes tx fees a little more challenging. The l1 gas cost of a transaction goes from being trivial to compute (based on the size of the tx and the number of zero bytes), to being a bit tricker to compute (you may have to attempt to compress the tx to determine the l1 gas cost.
Are there ways to create transactions which are super difficult to compress or decompress? We definitely need to audit the compression algorithms for these sorts of properties.
Is compression better to add at a 2nd layer? Eg. https://github.com/vyperlang/vyper/issues/2542 . -- My guess for this is that complex compression is better to add at a 2nd layer, but adding simple compression would be huge for tx costs (at least in the near term considering how terribly inefficient ABI is).

norswap commented 2 years ago

@karlfloersch

Yes, we should try to wargame if we can charge based on uncompressed calldata, or if adversarial input could be a significant DoS/griefing factor. Intuitively, it seems hard to avoid that it will be possible to have data that doesn't compress at all. So if we manage 50% compression, this means there is a 2x calldata cost amplification attack. We should absolutely avoid cases where the "compression" could be bigger than the original (I expect sensible algorithms to behave like this, but we should check).
Afaik, most popular compression algorithms are O(n) (which makes sense are they're often used to compress gigantic corpuses). But yes we should check.
Guesses seconded. This should also be benchmarked.

Some more notes:

Vitalik's compression algorithm: it compresses sequences of 0-valued bytes in the input. Note this compression can at worse double the size of the output (an input made exclusively of the marker 0xfe which needs to be escaped), but that is easily remedied by adding a clause that if the compression is bigger than the original, then the original is used instead (+ a one byte compressed/uncompressed marker).
Here's a repo benchmarking trie-based compression algorithms of the LZ family (of which snappy is one): https://github.com/inikep/lzbench
Something to consider here is we mostly care about read throughput, and that even that is not necessarily directly applicable, as our relevant metric is on-chain decompression gas cost, which can diverge significantly from execution time (e.g. branch prediction and cache-friendliness aren't really concerns for on chain computation but are significant driver of hardware performance).
I don't think it's relevant, but I can't resist mentionning the "zip of death": a few kb zip file that decompresses to petabytes of data.

protolambda commented 2 years ago

Regarding snappy compression: eth2 uses it in different places, and there are some fun edge-cases / security concerns!

Snappy has 2 types: block-compression and frame-compression (eth2 uses block compression in gossip, and frame-compression in sync for streaming larger data)
Snappy already has the clause thing where it limits the size by resorting back to as-is encoding in worst-case + flag.
The frame compression may contain checksums or extra marker data: it has a "sNaPpY" ASCII marker and CRC-32C per frame. If we pick a compression algo, consider the fixed overhead for small messages. The block-compression is more optimal.
The expected uncompressed length is prefixed. Useful for pre-allocation, but since the bounds are already small, it may not be necessary. At the same time, this may also be a DoS opportunity (it was in eth2 post-launch early on, now disclosed and fixed) to allocate ridiculous memory with a tiny message.
Snappy, like many compression algorithms, is not monomorph: you can have different messages that uncompress to the same data. This may result in different hashes etc. I.e. we could have two functionally identical deposits, but with a different hash, due to different compression. Live example: https://go.dev/play/p/UWfRq2fauiQ (justin drake found this before eth2 launch, and then I made this example of the issue. It would have been a network DoS otherwise, beware of unexpected duplicates!)
Snappy block-compression still splits up work, in 32 KB blocks. Having that splitting technique may not be necessary, or even counter-productive, as the average tx is smaller.

Some of these may also apply to other compression methods, don't trust any method to do the right thing by default.

And intuitively I think we want to optimize for monotonous data (32 bytes copy instructions pointing to previous decompressed data, maybe special handling of 0x00 and 0xff), and maybe not prioritize run-length-encoding like some compression algorithms do. Curious to see some stats of different compression techniques on L1 transaction history!

vbuterin commented 2 years ago

Regarding adversarial input, one thing that should definitely be done at some point is charging gas costs based on their contribution to compressed size, and not based on their uncompressed size. This way adversarial input would not break anything economically.

If we use my zero byte compression algo, then this is easy, because you can just directly reduce the gas cost for sequences of zero bytes. For snappy you can't quite do that. But if the sequencer is trusted to set the fee, the sequencer could do something else that's pretty simple: track the total length of the compressed block while building it up and charge each tx for its marginal contribution to the compressed block size.

norswap commented 2 years ago

@protolambda @vbuterin Is there some rationale for choosing Snappy for the consensus layer? I assume its high throughput must be a big part of that?

In the case of calldata compression, we probably want to optimize for compression ratio. In the benchmarks, even the slowest algorithms hover at ~1MB/s compression throughput, and that seems plenty for the foreseeable future (not to mention there are algorithms with great compression ratios with up to two order of magnitude more throughput). We'll need to confirm how these numbers hold up with actual call data.

@vbuterin Isn't charging marginal contribution to block size somewhat unfair to people whose transaction are placed at the start of the "compressed unit" (a batch?), where opportunity for compression is small?

It also means any kind of fee estimation has to be run through the sequencer, or at least a node that tracks the sequencer closely. This should be doable. A good property of charging marginal block size contribution is that this is a number that is expected to only go down as the size of the compressed unit grows, so errors in estimation are only overestimations, which are safe. However that only holds true if your transaction lands in the compressed unit that was ongoing at the time of estimation, and not at the start of a fresh one.

Speaking of "compressed unit", a batch of transactions is a natural fit, but maybe it sacrifices compression opportunity by virtue of being too small. We could consider compressing accross multiple batches, up to a certain maximal size. This requires a compression algorithm that is "streaming", i.e. does not need to pass over the whole input before starting to emit the compressed stream.

It might even be possible to do "rolling window compression". For instance Snappy compresses by referencing segments of the decompressed stream. We could constrain the compression algorithm to only consider the last X bytes of the decompressed stream.

Advantages of rolling window compression:

More compression opportunities
You no longer have "start of batch" transactions and so it is more fair to charge marginal size contribution.
You lose the property of "estimations are overestimations while in the same compression unit", but on the other hand, marginal size contribution doesn't shoot up when starting a new compression unit.
- There is a conservative fee estimation scheme which is to only consider the tail X% of the compression window, which is likely to still be in range when the transaction is considered by the sequencer. Because the available window willl be bigger, the actual size contribution should be smaller.

Disadvantages:

Need to stream decompress all batches since genesis to be able to decompress the latest batch
- This could be mitigated by infrequently posting the current decompressed stream window, e.g. once a week. Providing a wrong window needs to be a slashable offence somehow.
Obviously more complex.

Other random remarks:

I wrote above that "on-chain computation gas cost" were a concern, but now I realize we don't ever need to compress or decompress on-chain with our design (or only during the fraud proof, which doesn't really happen because of the dispute game).
I assume that what proto refers as Snappy's block mode is the regular format, while framing mode is a way to chunk the input stream so that the whole decompressed stream does not need to be hold in memory during decompression (remember Snappy compresses by reference segment of the decompressed stream). It is explained here.

vbuterin commented 2 years ago

@vbuterin Isn't charging marginal contribution to block size somewhat unfair to people whose transaction are placed at the start of the "compressed unit" (a batch?), where opportunity for compression is small?

My honest answer to this is "meh, whatever".

Charging gas fairly for compressed data is a much harder problem than just compressing data, and even the most unfair algorithm leaves pretty much all users paying significantly less than today. So it's a bad idea to let that concern get in the way of getting something quick but meaningful out there.

I should also add that zero-byte compression is very easy to charge gas for because it has a fixed formula.

ethereum-optimism / optimistic-specs

Support calldata compression #10