ckolivas / lrzip

Long Range Zip
http://lrzip.kolivas.org
GNU General Public License v2.0
618 stars 76 forks source link

Feature request: Output ZPAQ formatted files. #168

Closed ghost closed 4 years ago

ghost commented 4 years ago

Most of the time the user would compress a directory instead of a single file, and extract some (not all) of the files in an archive. ZPAQ format allow "solid" archives so you can still have deduplication in the whole archive while you can extract only the files you need. The archive would still be compatible with ZPAQ because the decompression code is already in the archive. Sure it is not streamable but I do not think that being streamable makes too much sense when you archive files.

pete4abw commented 4 years ago

try tar --use-compress-program="lrzip -z" -cf file.tar.lrz somedir

then tar -I lrzip -xf file.tar.lrz somedir/file

Be sure to quote lrzip -z

ghost commented 4 years ago

But Windows does not have tar. Also not every version of tar has --use-compress-program. ZPAQ is one single executable.

ckolivas commented 4 years ago

Lrzip doesn't have any windows support.

pete4abw commented 4 years ago

The lrzip file format is unique and distinct with its own header. Each stream 0 and stream 1 of each chunk has its own header format as well. No matter what, lrzip files may only be decompressed by lrzip. See Magic Header. If you want a ZPAQ archive, then you must use ZPAQ to create it or run it through tar as explained above. The truth is, the extra time ZPAQ takes to compress and decompress does not compare that favorably over lrzip.

I've done this before, but here is a new analysis comparing lrzip and zpaq (version 7.15. An improved library to that which is used in this version of lrzip. lrzip 0.7x uses the most recent zpaq library).

The following were done using lrzip default options. zpaq using -m4. I've highlighted the best and worst performers. File Size Compression Compressed Size Time to Compress Compression Ratio MB/s
linux-5.4.y.tar 1,639,557,120 tar N/A N/A N/A N/A
lrzip 0.631 largest rzip+lzma 164,377,635 02:19.10 9.974 11.164
lrzip 0.721 fastest rzip+lzma 159,383,452 02:18.21 10.287 11.326
lrzip 0.631 rzip+zpaq 140,330,077 04:57.33 11.684 5.263
lrzip 0.721 best rzip+zpaq 136,971,418 05:49.21 11.970 4.466
zpaq 7.15 slowest zpaq 153,487,409 06:36.40 10.682 3.693

zpaq+rzip provides the best compression, even over zpaq alone. You have to weigh whether the benefits are worth the triple-the-time difference.

Differences in lrzip 0.721 are due to differences in computation of buffer size, lzma sdk, variable lzma dictionary size. YMMV depending on file types.