Closed webmaster128 closed 2 months ago
Rust compression crates benchmark, https://github.com/llogiq/compressbench.
From the results, it seems zstd would be a good option for our use case (in-memory compression of binary data). It also has a compression level, which would allows us to easily choose a good balance between compression ratio and compression speed.
lzzzz also looks impressive, particularly regarding decompression speed.
Documenting this for when we decide to do this task.
Some sorted benchmark results, for convenience:
pack:
$ cat benchresults/benches-pack.csv
bench,time,ns
compression/tar.pack,236.92 us,236920.00
compression/lzma-rs/xz.pack,292.64 us,292640.00
compression/lzma-rs/2.pack,301.65 us,301650.00
compression/yazi-Default.pack,114.25 ms,114250000.00
compression/snappy-framed.pack,12.330 ms,12330000.000
compression/zip.pack,12.564 ms,12564000.000
compression/flate2-6.pack,125.98 ms,125980000.00
compression/lz4-compression.pack,13.479 ms,13479000.000
compression/lz_fear.pack,13.951 ms,13951000.000
compression/deflate-Default.pack,161.69 ms,161690000.00
compression/zstd-level-5.pack,17.100 ms,17100000.000
compression/flate2-1.pack,18.100 ms,18100000.000
compression/flate2-7.pack,194.30 ms,194300000.00
compression/zstd-level-6.pack,19.477 ms,19477000.000
compression/yazi-BestSpeed.pack,25.976 ms,25976000.000
compression/deflate-Fast.pack,26.605 ms,26605000.000
compression/zstd-level-7.pack,28.139 ms,28139000.000
compression/flate2-2.pack,29.021 ms,29021000.000
compression/flate2-8.pack,304.64 ms,304640000.00
compression/zstd-level-8.pack,33.505 ms,33505000.000
compression/yazi-BestSize.pack,337.98 ms,337980000.00
compression/snap.pack,3.8431 ms,3843100.0000
compression/flate2-3.pack,41.904 ms,41904000.000
compression/lzzzz.pack,4.1939 ms,4193900.0000
compression/zstd-level-9.pack,43.882 ms,43882000.000
compression/flate2-4.pack,44.732 ms,44732000.000
compression/flate2-5.pack,57.348 ms,57348000.000
compression/lz4_flex.pack,6.4156 ms,6415600.0000
compression/zstd-level-1.pack,7.3524 ms,7352400.0000
compression/zstd-level-2.pack,7.4803 ms,7480300.0000
compression/zstd-level-3.pack,9.5165 ms,9516500.0000
compression/deflate-Best.pack,953.03 ms,953030000.00
compression/lzma-rs.pack,98.247 ms,98247000.000
compression/zstd-level-4.pack,9.9698 ms,9969800.0000
compression/zopfli.pack,59.815 s,59815000000.000
compression/brotli.pack,7.7180 s,7718000000.0000
$
unpack:
$ cat benchresults/benches-unpack.csv
bench,time,ns
compression/lz4_flex.unpack,5.8115 ns,5.8115
compression/lzzzz.unpack,7.3390 ns,7.3390
compression/lzma-rs/2.unpack,501.74 us,501740.00
compression/zstd-level-9.unpack,53.171 us,53171.000
compression/zstd-level-8.unpack,55.482 us,55482.000
compression/zstd-level-7.unpack,57.484 us,57484.000
compression/zstd-level-6.unpack,66.994 us,66994.000
compression/zstd-level-1.unpack,71.364 us,71364.000
compression/zstd-level-5.unpack,71.419 us,71419.000
compression/zstd-level-4.unpack,72.170 us,72170.000
compression/zstd-level-3.unpack,72.422 us,72422.000
compression/zstd-level-2.unpack,74.352 us,74352.000
compression/lzma-rs/xz.unpack,814.21 us,814210.00
compression/zip.unpack,1.1954 ms,1195400.0000
compression/deflate-Best.unpack,18.519 ms,18519000.000
compression/deflate-Default.unpack,19.197 ms,19197000.000
compression/lzma-rs.unpack,210.45 ms,210450000.00
compression/snap.unpack,2.1702 ms,2170200.0000
compression/deflate-Fast.unpack,24.436 ms,24436000.000
compression/snappy-framed.unpack.nocrc,2.6818 ms,2681800.0000
compression/yazi-BestSize.unpack,3.1504 ms,3150400.0000
compression/yazi-Default.unpack,3.2946 ms,3294600.0000
compression/yazi-BestSpeed.unpack,5.0333 ms,5033300.0000
compression/flate2-8.unpack,5.3297 ms,5329700.0000
compression/flate2-7.unpack,5.3943 ms,5394300.0000
compression/flate2-6.unpack,5.4818 ms,5481800.0000
compression/flate2-5.unpack,5.7901 ms,5790100.0000
compression/flate2-4.unpack,6.0030 ms,6003000.0000
compression/flate2-3.unpack,6.1878 ms,6187800.0000
compression/lz4-compression.unpack,6.4815 ms,6481500.0000
compression/lz_fear.unpack,6.8218 ms,6821800.0000
compression/flate2-2.unpack,7.1148 ms,7114800.0000
compression/brotli.unpack,7.2828 ms,7282800.0000
compression/flate2-1.unpack,7.6414 ms,7641400.0000
compression/snappy-framed.unpack.crc,9.5301 ms,9530100.0000
$
It's clear that, in terms of compression / decompression speed, the LZMA family of algorithms shine here.
Reference: compressbench fork.
This is super nice, thank you.
Could you add a column with the compression ratio for a module file of one of our contracts? It would be interesting to learn how much we can win here in general and how the algorithms compare not only on speed but also compression.
Sure. I didn't add it just because it's a little difficult to extract it from the benchmarks. But I'll do it.
Here are the updated results.
pack:
$ cat benchresults/benches-pack.csv
bench,time,size,compression_ratio,ns
compression/tar.pack,236.92 us,3209216,1.000,236920.00
compression/lzma-rs/xz.pack,292.64 us,3207448,1.000,292640.00
compression/lzma-rs/2.pack,301.65 us,3207394,1.000,301650.00
compression/snap.pack,3.8431 ms,993450,.309,3843100.0000
compression/lzzzz.pack,4.1939 ms,926487,.288,4193900.0000
compression/lz4_flex.pack,6.4156 ms,999937,.311,6415600.0000
compression/zstd-level-1.pack,7.3524 ms,502723,.156,7352400.0000
compression/zstd-level-2.pack,7.4803 ms,493719,.153,7480300.0000
compression/zstd-level-3.pack,9.5165 ms,483315,.150,9516500.0000
compression/zstd-level-4.pack,9.9698 ms,481933,.150,9969800.0000
compression/snappy-framed.pack,12.330 ms,993995,.309,12330000.000
compression/zip.pack,12.564 ms,55602,.017,12564000.000
compression/lz4-compression.pack,13.479 ms,1041702,.324,13479000.000
compression/lz_fear.pack,13.951 ms,926506,.288,13951000.000
compression/zstd-level-5.pack,17.100 ms,464012,.144,17100000.000
compression/flate2-1.pack,18.100 ms,705536,.219,18100000.000
compression/zstd-level-6.pack,19.477 ms,457444,.142,19477000.000
compression/yazi-BestSpeed.pack,25.976 ms,691183,.215,25976000.000
compression/deflate-Fast.pack,26.605 ms,692936,.216,26605000.000
compression/zstd-level-7.pack,28.139 ms,419632,.130,28139000.000
compression/flate2-2.pack,29.021 ms,619885,.193,29021000.000
compression/zstd-level-8.pack,33.505 ms,408464,.127,33505000.000
compression/flate2-3.pack,41.904 ms,549306,.171,41904000.000
compression/zstd-level-9.pack,43.882 ms,404551,.126,43882000.000
compression/flate2-4.pack,44.732 ms,546679,.170,44732000.000
compression/flate2-5.pack,57.348 ms,531571,.165,57348000.000
compression/lzma-rs.pack,98.247 ms,1131078,.352,98247000.000
compression/yazi-Default.pack,114.25 ms,511734,.159,114250000.00
compression/flate2-6.pack,125.98 ms,511734,.159,125980000.00
compression/deflate-Default.pack,161.69 ms,510688,.159,161690000.00
compression/flate2-7.pack,194.30 ms,505345,.157,194300000.00
compression/flate2-8.pack,304.64 ms,501494,.156,304640000.00
compression/yazi-BestSize.pack,337.98 ms,500247,.155,337980000.00
compression/deflate-Best.pack,953.03 ms,498054,.155,953030000.00
compression/brotli.pack,7.7180 s,301546,.094,7718000000.0000
compression/zopfli.pack,59.815 s,464437,.144,59815000000.000
$
unpack:
$ cat benchresults/benches-unpack.csv
bench,time,size,compression_ratio,ns
compression/lz4_flex.unpack,5.8115 ns,,NA,5.8115
compression/lzzzz.unpack,7.3390 ns,,NA,7.3390
compression/zstd-level-9.unpack,53.171 us,,NA,53171.000
compression/zstd-level-8.unpack,55.482 us,,NA,55482.000
compression/zstd-level-7.unpack,57.484 us,,NA,57484.000
compression/zstd-level-6.unpack,66.994 us,,NA,66994.000
compression/zstd-level-1.unpack,71.364 us,,NA,71364.000
compression/zstd-level-5.unpack,71.419 us,,NA,71419.000
compression/zstd-level-4.unpack,72.170 us,,NA,72170.000
compression/zstd-level-3.unpack,72.422 us,,NA,72422.000
compression/zstd-level-2.unpack,74.352 us,,NA,74352.000
compression/lzma-rs/2.unpack,501.74 us,,NA,501740.00
compression/lzma-rs/xz.unpack,814.21 us,,NA,814210.00
compression/zip.unpack,1.1954 ms,,NA,1195400.0000
compression/snap.unpack,2.1702 ms,,NA,2170200.0000
compression/snappy-framed.unpack.nocrc,2.6818 ms,,NA,2681800.0000
compression/yazi-BestSize.unpack,3.1504 ms,,NA,3150400.0000
compression/yazi-Default.unpack,3.2946 ms,,NA,3294600.0000
compression/yazi-BestSpeed.unpack,5.0333 ms,,NA,5033300.0000
compression/flate2-8.unpack,5.3297 ms,,NA,5329700.0000
compression/flate2-7.unpack,5.3943 ms,,NA,5394300.0000
compression/flate2-6.unpack,5.4818 ms,,NA,5481800.0000
compression/flate2-5.unpack,5.7901 ms,,NA,5790100.0000
compression/flate2-4.unpack,6.0030 ms,,NA,6003000.0000
compression/flate2-3.unpack,6.1878 ms,,NA,6187800.0000
compression/lz4-compression.unpack,6.4815 ms,,NA,6481500.0000
compression/lz_fear.unpack,6.8218 ms,,NA,6821800.0000
compression/flate2-2.unpack,7.1148 ms,,NA,7114800.0000
compression/brotli.unpack,7.2828 ms,,NA,7282800.0000
compression/flate2-1.unpack,7.6414 ms,,NA,7641400.0000
compression/snappy-framed.unpack.crc,9.5301 ms,,NA,9530100.0000
compression/deflate-Best.unpack,18.519 ms,,NA,18519000.000
compression/deflate-Default.unpack,19.197 ms,,NA,19197000.000
compression/deflate-Fast.unpack,24.436 ms,,NA,24436000.000
compression/lzma-rs.unpack,210.45 ms,,NA,210450000.00
$
In fact, adding the size allows us to quickly rule out some candidates (no compression at all, even inflating the files a little). I also fixed a bug with the numerical sort, so, the list is properly sorted by time now.
Provided that there are no errors in these benchmarks, lz4_flex
and lzzzz
both have impressive performance, both unpacking and packing.
I copied the CSVs to gist, which gives us better readability: https://gist.github.com/webmaster128/8c5518d0eabc5931a8e911515cf00303. lz4_flex
and lzzzz
seem to have stunning decrompression speeds. But compress down to 30% of the original size. Some alternatives can compress to 15% and less.
Yes. I was just thinking that to establish some kind of "absolute" measure of which algo is better, we could compute a weighted sum of the compression ratio and the (normalized) compression and decompression times.
To be meaningful, time normalization must be done not against the worst time, but against the mean / median time. Or, alternatively, against a user defined value: what is the maximum time you're willing to tolerate, trying to achieve the best compression ratio.
Doing that for 10 ms maximum time, removing the cases that don't report both (pack and unpack) benchmarks, and summing packing and unpacking times, we get:
$ egrep -v '^(bench|compression/(tar|snappy-framed|zopfli))\b' benchresults/benches.csv | sort -t, -k1 | sed 'N;s/\n/,/' | awk -F, '{print $1","$7+$14}' | sed 's/.pack,/,/' | sort -t, -gk2
compression/lzzzz,0.707391
compression/zstd-level-1,0.898376
compression/zstd-level-2,0.908465
compression/snap,0.91033
compression/lz4_flex,0.952561
compression/lzma-rs/2,1.08034
compression/zstd-level-3,1.10889
compression/lzma-rs/xz,1.11068
compression/zstd-level-4,1.1542
compression/zip,1.39294
compression/zstd-level-5,1.86114
compression/zstd-level-6,2.0964
compression/lz4-compression,2.32005
compression/lz_fear,2.36528
compression/flate2-1,2.79314
compression/zstd-level-7,2.94965
compression/yazi-BestSpeed,3.31593
compression/zstd-level-8,3.48305
compression/flate2-2,3.80658
compression/zstd-level-9,4.51952
compression/flate2-3,4.98018
compression/flate2-4,5.2435
compression/deflate-Fast,5.3201
compression/flate2-5,6.47881
compression/yazi-Default,11.9135
compression/flate2-6,13.3052
compression/deflate-Default,18.2477
compression/flate2-7,20.1264
compression/flate2-8,31.153
compression/lzma-rs,31.2217
compression/yazi-BestSize,34.268
compression/deflate-Best,97.3099
compression/brotli,772.622
$
I've found that, when you are willing to spend some more time, zip
and the zstd
family are better options that the lz
family.
It would also be interesting to weigh compression and decompression times differently.
Will polish this a little, and publish the latest changes during the week.
Could we sort the algorithms consistently with something like compression/crate_name/algorithm_identifier
as some crates contain different flavours with different characteristics.
E.g.
compression/zstd/level-5
compression/zstd/level-7
compression/deflate/Fast
compression/brotli
compression/lzzzz/lz4
compression/lzzzz/lz4_hc
Especially the last one would be an interesting addition. It is a more expensize compressor that uses the same (super fast) decompressor). See https://docs.rs/lzzzz/0.8.0/lzzzz/index.html and https://github.com/lz4/lz4#benchmarks.
Nice. We can extend the benchmark with those flavours, yes.
Updated results: https://gist.github.com/maurolacy/26bc3d1458f4422405ede4c9ba3f2b09
Closing because it turns out that the file system cache is MUCH faster these days. It loads moduls in 1-2ms from SSD and microseconds from NVMe
Right now we use 2 levels of module caching:
Module
sGiven the learnings from #722, we can make this 3 levels:
In 2., the modules are roughly
half thethe same size compared to 1.. If we can efficiently compress the serialization, we might get a great middle cache between the plainModule
s and the FS caching.The learnings from here might help improve the FS cache as well.