Closed JsBergbau closed 1 year ago
Some other figures. Realized later, that I forgot using ultra-compression for media wiki. Very impressive how much better this mode is. On the other hand, compared to Winrar using --ultra -22
is much slower than Winrar. --long=31
seems to do the trick.
used zstd:
*** zstd command line interface 64-bits v1.5.2, by Yann Collet ***
Mediawiki mysql dump: 1870 MB Plaintext Winrar comressed at maxmium to 11.3 MB(!), really ZSTD -T0 -12: 66.5 MB ZSTD -T0 -19: 41.95 MB ZSTD -T0 --ultra -22: 13.5 MB ZSTD -T0 -15 --long=31: 12.44 MB GZIP default compression level: 412.6 MB
So we really should figure out why Winrar is so much better.
On the other hand, there are also other examples where ZSTD is even better than Winrar and also between the different levels not so much difference.
Matomo mysql dump: 2948 MB Plaintext Winrar at maximium compression level 443.2 MB ZSTD -T0 -12: 451,6 MB ZSTD -T0 -15: 448,6 MB ZSTD -T0 -19: 408,7 MB ZSTD -T0 --ultra -22: 403.3 MB GZIP default compression level: 554,4 MB
Update:
./zstd -15 --long=31 AcronisTrueImage2021BootCD.iso
compresses to 175.9 MB, so even better than Winrar.
2048 MB is not much memory. So we should consider using longer windows for default.
Update: Another MySQL-Dump (typo3) 915.3 MB in size
Using: zstd -T0 -15 --long=31 results in 26.97 MB compressed file size ./zstd -T0 -15 results in 25.78 MB So in this case 4.6 % larger file size, wenn using longer compression window. Very strange.
--long
mode should always be positive for high compression modes (btopt
and above), starting level 16.
Below that point however, --long
is more like a "bet", which tends to be fine in "general" cases, but can occasionally go wrong.
In this case for example, the compression factor is very high (more than x30!), which means regular matches are already very long and therefore competitive with the ones found by --long
.
We can't default to window sizes larger than 8MB, because that is the max window size we say all decoders should support in our spec. So you must explicitly opt into a larger window size with --long
.
"2048 MB is not much memory. So we should consider using longer windows for default."
On the contrary that is a MASSIVE amount of memory. If you had a server compressing on the fly to 100 clients, that would eat up 200GB of RAM. And then every client reading from those servers would need to support a 2GB window. The per-stream memory overhead to decompress zstd is an important limit that allows it to be used in a broader set of use cases.
Not everyone uses zstd primarily for archival storage. On the contrary, because it is rather fast at both compression and decompression it is often used for dynamic data compression or use cases where the compression has to keep up with multiple continuous inbound data streams.
WinRar is nearly exclusively used for archives, usually no more than one task at a time, and can require a larger chunk of memory.
Would you want your phone to allocate 2GB of memory just to download compressed app updates from the web in the background? The 8MB default limit is what lets zstd be an option for use cases like this -- its a compromise that lets any standard/generic implementation operate in a small footprint no matter who compressed the source data.
Just beware, if you use --long:31 you are requiring anyone that decompresses the data to potentially need 2GB of RAM to do so. --long:29 might be more than enough and require 4x less RAM.
Closing as there is no immediate action to be taken. We can't, and don't want to, default zstd to using more than 8MB of memory, unless you explicitly opt into it with --long
or --ultra
.
The only action we can reasonably take is to raise awareness of --long
.
Given is Acronis TrueImage Boot-CD iso Image 2021, filesize 706 MB. Compressed with Winrar filesize is 178 MB. ZSTD using maximum settings with Peazip results in 393MB, using
zstd --ultra -22 -T16 AcronisTrueImage2021BootCD.iso
results in 383 MB. This is still twice of the size of Winrar.So Winrar must do some trick to get this massive compression.
pigz -v9 AcronisTrueImage2021BootCD.iso
results in 675 MB, so ZSTD is already a massive improvement compared to gzip, but still I consider this example it worth to examine whats going on to improve zstd.You can download the iso here https://archive.org/details/acronis_2021 sha256sum a907788710997da7b413d49c8ab124019e836ca6552341e92a54b3d346472059