CosmWasm / cosmwasm

Framework for building smart contracts in Wasm for the Cosmos SDK
https://www.cosmwasm.com/
Apache License 2.0
1.07k stars 336 forks source link

Evaluate compressed module serialization #729

Closed webmaster128 closed 2 months ago

webmaster128 commented 3 years ago

Right now we use 2 levels of module caching:

Given the learnings from #722, we can make this 3 levels:

  1. In-memory caching (instantiation in 50 µs)
  2. Serialized in-memory caching (instantiation in 1-2ms)
  3. File system caching (instantiation in 50ms)

In 2., the modules are roughly half the the same size compared to 1.. If we can efficiently compress the serialization, we might get a great middle cache between the plain Modules and the FS caching.

The learnings from here might help improve the FS cache as well.

maurolacy commented 3 years ago

Rust compression crates.

Rust compression crates benchmark, https://github.com/llogiq/compressbench.

From the results, it seems zstd would be a good option for our use case (in-memory compression of binary data). It also has a compression level, which would allows us to easily choose a good balance between compression ratio and compression speed.

lzzzz also looks impressive, particularly regarding decompression speed.

Documenting this for when we decide to do this task.

maurolacy commented 3 years ago

Some sorted benchmark results, for convenience:

pack:

$ cat benchresults/benches-pack.csv 
bench,time,ns
compression/tar.pack,236.92 us,236920.00
compression/lzma-rs/xz.pack,292.64 us,292640.00
compression/lzma-rs/2.pack,301.65 us,301650.00
compression/yazi-Default.pack,114.25 ms,114250000.00
compression/snappy-framed.pack,12.330 ms,12330000.000
compression/zip.pack,12.564 ms,12564000.000
compression/flate2-6.pack,125.98 ms,125980000.00
compression/lz4-compression.pack,13.479 ms,13479000.000
compression/lz_fear.pack,13.951 ms,13951000.000
compression/deflate-Default.pack,161.69 ms,161690000.00
compression/zstd-level-5.pack,17.100 ms,17100000.000
compression/flate2-1.pack,18.100 ms,18100000.000
compression/flate2-7.pack,194.30 ms,194300000.00
compression/zstd-level-6.pack,19.477 ms,19477000.000
compression/yazi-BestSpeed.pack,25.976 ms,25976000.000
compression/deflate-Fast.pack,26.605 ms,26605000.000
compression/zstd-level-7.pack,28.139 ms,28139000.000
compression/flate2-2.pack,29.021 ms,29021000.000
compression/flate2-8.pack,304.64 ms,304640000.00
compression/zstd-level-8.pack,33.505 ms,33505000.000
compression/yazi-BestSize.pack,337.98 ms,337980000.00
compression/snap.pack,3.8431 ms,3843100.0000
compression/flate2-3.pack,41.904 ms,41904000.000
compression/lzzzz.pack,4.1939 ms,4193900.0000
compression/zstd-level-9.pack,43.882 ms,43882000.000
compression/flate2-4.pack,44.732 ms,44732000.000
compression/flate2-5.pack,57.348 ms,57348000.000
compression/lz4_flex.pack,6.4156 ms,6415600.0000
compression/zstd-level-1.pack,7.3524 ms,7352400.0000
compression/zstd-level-2.pack,7.4803 ms,7480300.0000
compression/zstd-level-3.pack,9.5165 ms,9516500.0000
compression/deflate-Best.pack,953.03 ms,953030000.00
compression/lzma-rs.pack,98.247 ms,98247000.000
compression/zstd-level-4.pack,9.9698 ms,9969800.0000
compression/zopfli.pack,59.815 s,59815000000.000
compression/brotli.pack,7.7180 s,7718000000.0000
$ 

unpack:

$ cat benchresults/benches-unpack.csv 
bench,time,ns
compression/lz4_flex.unpack,5.8115 ns,5.8115
compression/lzzzz.unpack,7.3390 ns,7.3390
compression/lzma-rs/2.unpack,501.74 us,501740.00
compression/zstd-level-9.unpack,53.171 us,53171.000
compression/zstd-level-8.unpack,55.482 us,55482.000
compression/zstd-level-7.unpack,57.484 us,57484.000
compression/zstd-level-6.unpack,66.994 us,66994.000
compression/zstd-level-1.unpack,71.364 us,71364.000
compression/zstd-level-5.unpack,71.419 us,71419.000
compression/zstd-level-4.unpack,72.170 us,72170.000
compression/zstd-level-3.unpack,72.422 us,72422.000
compression/zstd-level-2.unpack,74.352 us,74352.000
compression/lzma-rs/xz.unpack,814.21 us,814210.00
compression/zip.unpack,1.1954 ms,1195400.0000
compression/deflate-Best.unpack,18.519 ms,18519000.000
compression/deflate-Default.unpack,19.197 ms,19197000.000
compression/lzma-rs.unpack,210.45 ms,210450000.00
compression/snap.unpack,2.1702 ms,2170200.0000
compression/deflate-Fast.unpack,24.436 ms,24436000.000
compression/snappy-framed.unpack.nocrc,2.6818 ms,2681800.0000
compression/yazi-BestSize.unpack,3.1504 ms,3150400.0000
compression/yazi-Default.unpack,3.2946 ms,3294600.0000
compression/yazi-BestSpeed.unpack,5.0333 ms,5033300.0000
compression/flate2-8.unpack,5.3297 ms,5329700.0000
compression/flate2-7.unpack,5.3943 ms,5394300.0000
compression/flate2-6.unpack,5.4818 ms,5481800.0000
compression/flate2-5.unpack,5.7901 ms,5790100.0000
compression/flate2-4.unpack,6.0030 ms,6003000.0000
compression/flate2-3.unpack,6.1878 ms,6187800.0000
compression/lz4-compression.unpack,6.4815 ms,6481500.0000
compression/lz_fear.unpack,6.8218 ms,6821800.0000
compression/flate2-2.unpack,7.1148 ms,7114800.0000
compression/brotli.unpack,7.2828 ms,7282800.0000
compression/flate2-1.unpack,7.6414 ms,7641400.0000
compression/snappy-framed.unpack.crc,9.5301 ms,9530100.0000
$  

It's clear that, in terms of compression / decompression speed, the LZMA family of algorithms shine here.

Reference: compressbench fork.

webmaster128 commented 3 years ago

This is super nice, thank you.

Could you add a column with the compression ratio for a module file of one of our contracts? It would be interesting to learn how much we can win here in general and how the algorithms compare not only on speed but also compression.

maurolacy commented 3 years ago

Sure. I didn't add it just because it's a little difficult to extract it from the benchmarks. But I'll do it.

maurolacy commented 3 years ago

Here are the updated results.

pack:

$ cat benchresults/benches-pack.csv 
bench,time,size,compression_ratio,ns
compression/tar.pack,236.92 us,3209216,1.000,236920.00
compression/lzma-rs/xz.pack,292.64 us,3207448,1.000,292640.00
compression/lzma-rs/2.pack,301.65 us,3207394,1.000,301650.00
compression/snap.pack,3.8431 ms,993450,.309,3843100.0000
compression/lzzzz.pack,4.1939 ms,926487,.288,4193900.0000
compression/lz4_flex.pack,6.4156 ms,999937,.311,6415600.0000
compression/zstd-level-1.pack,7.3524 ms,502723,.156,7352400.0000
compression/zstd-level-2.pack,7.4803 ms,493719,.153,7480300.0000
compression/zstd-level-3.pack,9.5165 ms,483315,.150,9516500.0000
compression/zstd-level-4.pack,9.9698 ms,481933,.150,9969800.0000
compression/snappy-framed.pack,12.330 ms,993995,.309,12330000.000
compression/zip.pack,12.564 ms,55602,.017,12564000.000
compression/lz4-compression.pack,13.479 ms,1041702,.324,13479000.000
compression/lz_fear.pack,13.951 ms,926506,.288,13951000.000
compression/zstd-level-5.pack,17.100 ms,464012,.144,17100000.000
compression/flate2-1.pack,18.100 ms,705536,.219,18100000.000
compression/zstd-level-6.pack,19.477 ms,457444,.142,19477000.000
compression/yazi-BestSpeed.pack,25.976 ms,691183,.215,25976000.000
compression/deflate-Fast.pack,26.605 ms,692936,.216,26605000.000
compression/zstd-level-7.pack,28.139 ms,419632,.130,28139000.000
compression/flate2-2.pack,29.021 ms,619885,.193,29021000.000
compression/zstd-level-8.pack,33.505 ms,408464,.127,33505000.000
compression/flate2-3.pack,41.904 ms,549306,.171,41904000.000
compression/zstd-level-9.pack,43.882 ms,404551,.126,43882000.000
compression/flate2-4.pack,44.732 ms,546679,.170,44732000.000
compression/flate2-5.pack,57.348 ms,531571,.165,57348000.000
compression/lzma-rs.pack,98.247 ms,1131078,.352,98247000.000
compression/yazi-Default.pack,114.25 ms,511734,.159,114250000.00
compression/flate2-6.pack,125.98 ms,511734,.159,125980000.00
compression/deflate-Default.pack,161.69 ms,510688,.159,161690000.00
compression/flate2-7.pack,194.30 ms,505345,.157,194300000.00
compression/flate2-8.pack,304.64 ms,501494,.156,304640000.00
compression/yazi-BestSize.pack,337.98 ms,500247,.155,337980000.00
compression/deflate-Best.pack,953.03 ms,498054,.155,953030000.00
compression/brotli.pack,7.7180 s,301546,.094,7718000000.0000
compression/zopfli.pack,59.815 s,464437,.144,59815000000.000
$ 

unpack:

$ cat benchresults/benches-unpack.csv 
bench,time,size,compression_ratio,ns
compression/lz4_flex.unpack,5.8115 ns,,NA,5.8115
compression/lzzzz.unpack,7.3390 ns,,NA,7.3390
compression/zstd-level-9.unpack,53.171 us,,NA,53171.000
compression/zstd-level-8.unpack,55.482 us,,NA,55482.000
compression/zstd-level-7.unpack,57.484 us,,NA,57484.000
compression/zstd-level-6.unpack,66.994 us,,NA,66994.000
compression/zstd-level-1.unpack,71.364 us,,NA,71364.000
compression/zstd-level-5.unpack,71.419 us,,NA,71419.000
compression/zstd-level-4.unpack,72.170 us,,NA,72170.000
compression/zstd-level-3.unpack,72.422 us,,NA,72422.000
compression/zstd-level-2.unpack,74.352 us,,NA,74352.000
compression/lzma-rs/2.unpack,501.74 us,,NA,501740.00
compression/lzma-rs/xz.unpack,814.21 us,,NA,814210.00
compression/zip.unpack,1.1954 ms,,NA,1195400.0000
compression/snap.unpack,2.1702 ms,,NA,2170200.0000
compression/snappy-framed.unpack.nocrc,2.6818 ms,,NA,2681800.0000
compression/yazi-BestSize.unpack,3.1504 ms,,NA,3150400.0000
compression/yazi-Default.unpack,3.2946 ms,,NA,3294600.0000
compression/yazi-BestSpeed.unpack,5.0333 ms,,NA,5033300.0000
compression/flate2-8.unpack,5.3297 ms,,NA,5329700.0000
compression/flate2-7.unpack,5.3943 ms,,NA,5394300.0000
compression/flate2-6.unpack,5.4818 ms,,NA,5481800.0000
compression/flate2-5.unpack,5.7901 ms,,NA,5790100.0000
compression/flate2-4.unpack,6.0030 ms,,NA,6003000.0000
compression/flate2-3.unpack,6.1878 ms,,NA,6187800.0000
compression/lz4-compression.unpack,6.4815 ms,,NA,6481500.0000
compression/lz_fear.unpack,6.8218 ms,,NA,6821800.0000
compression/flate2-2.unpack,7.1148 ms,,NA,7114800.0000
compression/brotli.unpack,7.2828 ms,,NA,7282800.0000
compression/flate2-1.unpack,7.6414 ms,,NA,7641400.0000
compression/snappy-framed.unpack.crc,9.5301 ms,,NA,9530100.0000
compression/deflate-Best.unpack,18.519 ms,,NA,18519000.000
compression/deflate-Default.unpack,19.197 ms,,NA,19197000.000
compression/deflate-Fast.unpack,24.436 ms,,NA,24436000.000
compression/lzma-rs.unpack,210.45 ms,,NA,210450000.00
$ 

In fact, adding the size allows us to quickly rule out some candidates (no compression at all, even inflating the files a little). I also fixed a bug with the numerical sort, so, the list is properly sorted by time now.

Provided that there are no errors in these benchmarks, lz4_flex and lzzzz both have impressive performance, both unpacking and packing.

webmaster128 commented 3 years ago

I copied the CSVs to gist, which gives us better readability: https://gist.github.com/webmaster128/8c5518d0eabc5931a8e911515cf00303. lz4_flex and lzzzz seem to have stunning decrompression speeds. But compress down to 30% of the original size. Some alternatives can compress to 15% and less.

maurolacy commented 3 years ago

Yes. I was just thinking that to establish some kind of "absolute" measure of which algo is better, we could compute a weighted sum of the compression ratio and the (normalized) compression and decompression times.

To be meaningful, time normalization must be done not against the worst time, but against the mean / median time. Or, alternatively, against a user defined value: what is the maximum time you're willing to tolerate, trying to achieve the best compression ratio.

maurolacy commented 3 years ago

Doing that for 10 ms maximum time, removing the cases that don't report both (pack and unpack) benchmarks, and summing packing and unpacking times, we get:

$ egrep -v '^(bench|compression/(tar|snappy-framed|zopfli))\b' benchresults/benches.csv | sort -t, -k1 | sed 'N;s/\n/,/' | awk -F, '{print $1","$7+$14}' | sed 's/.pack,/,/' | sort -t, -gk2
compression/lzzzz,0.707391
compression/zstd-level-1,0.898376
compression/zstd-level-2,0.908465
compression/snap,0.91033
compression/lz4_flex,0.952561
compression/lzma-rs/2,1.08034
compression/zstd-level-3,1.10889
compression/lzma-rs/xz,1.11068
compression/zstd-level-4,1.1542
compression/zip,1.39294
compression/zstd-level-5,1.86114
compression/zstd-level-6,2.0964
compression/lz4-compression,2.32005
compression/lz_fear,2.36528
compression/flate2-1,2.79314
compression/zstd-level-7,2.94965
compression/yazi-BestSpeed,3.31593
compression/zstd-level-8,3.48305
compression/flate2-2,3.80658
compression/zstd-level-9,4.51952
compression/flate2-3,4.98018
compression/flate2-4,5.2435
compression/deflate-Fast,5.3201
compression/flate2-5,6.47881
compression/yazi-Default,11.9135
compression/flate2-6,13.3052
compression/deflate-Default,18.2477
compression/flate2-7,20.1264
compression/flate2-8,31.153
compression/lzma-rs,31.2217
compression/yazi-BestSize,34.268
compression/deflate-Best,97.3099
compression/brotli,772.622
$ 

I've found that, when you are willing to spend some more time, zip and the zstd family are better options that the lz family.

It would also be interesting to weigh compression and decompression times differently.

Will polish this a little, and publish the latest changes during the week.

webmaster128 commented 3 years ago

Could we sort the algorithms consistently with something like compression/crate_name/algorithm_identifier as some crates contain different flavours with different characteristics.

E.g.

compression/zstd/level-5
compression/zstd/level-7
compression/deflate/Fast
compression/brotli
compression/lzzzz/lz4
compression/lzzzz/lz4_hc

Especially the last one would be an interesting addition. It is a more expensize compressor that uses the same (super fast) decompressor). See https://docs.rs/lzzzz/0.8.0/lzzzz/index.html and https://github.com/lz4/lz4#benchmarks.

maurolacy commented 3 years ago

Nice. We can extend the benchmark with those flavours, yes.

maurolacy commented 3 years ago

Updated results: https://gist.github.com/maurolacy/26bc3d1458f4422405ede4c9ba3f2b09

webmaster128 commented 2 months ago

Closing because it turns out that the file system cache is MUCH faster these days. It loads moduls in 1-2ms from SSD and microseconds from NVMe