FoldingAtHome / fah-client-bastet

Folding@home client, code named Bastet
GNU General Public License v3.0
72 stars 12 forks source link

Replace bzip2 #143

Open Artoria2e5 opened 1 year ago

Artoria2e5 commented 1 year ago

bzip2 is ancient. It is slow to decompress and does not provide the best compression ratio. Most projects have switched to something else; Fedora's comparison may be useful here. Since this is a new version of FAH, it might finally be time to also change the compression on the tarball.

Some timing and sizing data on fahcore-22-windows-64bit-release-0.0.20. The -T0 in xz enables multithreading; both single-threaded and multi-thread decomp are tested.

$ time bzip2 -dk fahcore-22-windows-64bit-release-0.0.20.tar.bz2
real    0m13.048s
user    0m0.000s
sys     0m0.000s

$ xz -T0 -k -v --x86 --lzma2=preset=6 fahcore-22-windows-64bit-release-0.0.20.tarfahcore-22-windows-64bit-release-0.0.20.tar (1/1)
  100 %       116.5 MiB / 236.3 MiB = 0.493    15 MiB/s       0:15

$ ls -l fah*
-rw-r--r-- 1 arthu arthu 247808000 Sep  1 15:07 fahcore-22-windows-64bit-release-0.0.20.tar
-rw-r--r-- 1 arthu arthu 156444974 Sep  1 15:08 fahcore-22-windows-64bit-release-0.0.20.tar.bz2
-rw-r--r-- 1 arthu arthu 122121176 Sep  1 15:07 fahcore-22-windows-64bit-release-0.0.20.tar.xz

$ rm  fahcore-22-windows-64bit-release-0.0.20.tar

$ time xz -dk fahcore-22-windows-64bit-release-0.0.20.tar.xz
real    0m7.548s
user    0m0.000s
sys     0m0.000s

$ rm  fahcore-22-windows-64bit-release-0.0.20.tar

$ time xz -T0 -dk fahcore-22-windows-64bit-release-0.0.20.tar.xz
real    0m1.418s
user    0m0.000s
sys     0m0.000s

$ zstd fahcore-22-windows-64bit-release-0.0.20.tar
fahcore-22-windows-64bit-release-0.0.20.tar : 59.84%   (   236 MiB =>    141 MiB, fahcore-22-windows-64bit-release-0.0.20.tar.zst)

$ ls -l fahcore-22-windows-64bit-release-0.0.20.tar.zst
-rw-r--r-- 1 arthu arthu 148296926 Sep  1 15:07 fahcore-22-windows-64bit-release-0.0.20.tar.zst

$ rm  fahcore-22-windows-64bit-release-0.0.20.tar

$ time zstd -d fahcore-22-windows-64bit-release-0.0.20.tar.zst
fahcore-22-windows-64bit-release-0.0.20.tar.zst: 247808000 bytes

real    0m0.384s
user    0m0.156s
sys     0m0.125s

Both zstd and xz with BCJ compresses better than bzip2 and decompresses faster. Zstd is slightly smaller but very fast (34×). Xz is significantly smaller but only ~70% faster single-threaded.


Slight issue with zstd is that it requires an pypi module python-zstandard; xz is covered by the builtin lzma module. Cbang, which is currently used to handle tar and bz2, also does not have zstd nor xz support in https://github.com/CauldronDevelopmentLLC/cbang/blob/master/src/cbang/iostream/CompressionFilter.h.

marcosfrm commented 2 months ago

cbang already links three compression libraries (zlib, libbz2, liblz4). If another library is to be added, it should be libzstd, as it renders the three currently supported algorithms (plus lzma) obsolete in a way, and in addition can be used for HTTP content-encoding.

Artoria2e5 commented 2 months ago

I do not believe zstd renders LZMA obsolete, because it does not cover the "compress once but do it really well" niche enough. That niche used to be bzip2's land, but lzma now do it both faster and better. Zstd is about being fast, both ways, while being also better than before. And yes, HTTP is about being fast both ways too.

https://quixdb.github.io/squash-benchmark/#ratio-vs-decompression (I do believe the zstd times are a little broken here!)

[that said, zstd is smaller than bzip2 here too]

marcosfrm commented 2 months ago

zstd is good overall, delivering compression similar to lz4 at low levels, equivalent to gzip and bzip2 at intermediate levels, and close to lzma at high levels. Additionally, it's a well-maintained codebase -- something to consider after the xz backdoor fiasco.

By the way, zstd also has a -T0 option.