ckolivas / lrzip

Long Range Zip
http://lrzip.kolivas.org
GNU General Public License v2.0
618 stars 76 forks source link

(Feature request) Add LZMA2 compression support #40

Closed PaintNinja closed 9 years ago

PaintNinja commented 9 years ago

Currently, lrzip supports LZMA, however, a revised version of this algorithm which achieves similar compression ratios at significantly higher performance has been released, and supported in archive formats such as 7z and XZ.

It would be great if LZMA2 could become the new default algorithm in lrzip, as it would deliver similar results at better performance.

pete4abw commented 9 years ago

+1 Good suggestion! I have noticed that xz has been getting higher compression levels than lrzip.

ckolivas commented 9 years ago

I did not notice lzma2 getting better compression than lzma when I tested it. Do you have a link that demonstrates this?

pete4abw commented 9 years ago

Let me clarify. Turns out that 700MB Kernel Source files are slightly smaller in XZ than LRZ using standard options. But a larger 10.8GB file Android Backup LRZ was slightly better than XZ. But XZ took a lot longer. LZMA2 is just a container where multiple streams can have LZMA with different dictionaries, but it's still LZMA. In fact, it's a lot like what lrzip does now. So, since lrzip basically does what the lzma2 container supports, it's not really necessary to implement, IMHO. (me slaps myself for not researching before responding).

PaintNinja commented 9 years ago

After doing some tests of my own on a game's source files (686MB), lrzip does seem to perform better than XZ in regards with compression ratio, but took longer than XZ to process.

Tar: 686MB LRZ: 307MB (4.5MB/sec) XZ: 324MB (7MB/sec)

Interestingly though, only preparing the file with LRZ and then compressing the prepared file with XZ resulted in less time taken than LRZ to process, and a lower filesize than plain XZ. (316MB).

Compression-wise, the current default in lrzip with LZMA is great, but speed-wise, it may be possible to improve performance more by sacrificing a bit of compression ratio with LZMA2. Since the differences aren't that big, I have to agree with pete that it's not really necessary to implement, but may come in handy when you want a midpoint between GZIP and LZMA when using LRZip.

pete4abw commented 9 years ago

The file is small. The difference in time negligible. The lrzip pre-processor is what distinguishes it from other programs. lrzip will show better results in large files where there is greater randomness. Source files will have a lot of similarities since they are text based. You will always be able to find one file or another which another program will do better than lrzip. It's when the size of the file exceeds memory that lrzip will consistently do better. One reason is that if the data will not compress well (recall we test for compressability and will skip if LZO test fails). If so, just the rzip chunk will be stored. XZ will try and try to compress even if data is not good for compression. Take this example from an Android Backup uncompressed and put into a tar.

peter@tommyiv:/tmp$ time LRZIP=NOCONFIG lrzip backup.tar Output filename is: backup.tar.lrz backup.tar - Compression Ratio: 1.244. Average Compression Speed: 14.110MB/s. Total time: 00:01:13.18

real 1m13.188s user 7m15.305s sys 0m7.189s

peter@tommyiv:/tmp$ time xz -k backup.tar

Compression Ratio: 1.22 Compression Speed: 1.54 MB/s Ratio: real 9m33.633s user 6m37.555s sys 0m2.212s

Here are the file sizes. 1,080,796,160 Mar 24 12:58 backup.tar 868,588,596 Mar 24 13:00 backup.tar.lrz 882,470,968 Mar 24 12:58 backup.tar.xz

So, 1 min 22 sec versus 9 min 33 sec. And in this case XZ was a little larger but much slower. This is because of the LZO test. This is why lrzip is so much better with large files but maybe not as good with source files. If you ran the same game file test without the LZO test I think you will find better speed. lrzip -T.

Enjoy.

PaintNinja commented 9 years ago

Very impressive... I'll be using this as my prefered compression format for large files. Thank you for the explanation. :)

pete4abw commented 9 years ago

This was a good exercise. I did another run on a 1GB tar file containing two kernel version sources, 3.10 and 3.19. The results still favor lrzip. EVEN when I prepare a file for xz using lrzip -n. Here is the output. lrzip is still 9 times faster and the compressed file was 38% smaller than xz. The rzip preprocessor is quite helpful.

peter@tommyiv:/tmp$ LRZIP=NOCONFIG lrzip linux.test.tar Output filename is: linux.test.tar.lrz linux.test.tar - Compression Ratio: 11.026. Average Compression Speed: 15.057MB/s. Total time: 00:01:09.58

peter@tommyiv:/tmp$ time xz -vk linux.test.tar linux.test.tar (1/1) 100 % 153.5 MiB / 1,054.8 MiB = 0.146 1.9 MiB/s 9:07

1,106,022,400 Mar 25 12:55 linux.test.tar 100,314,220 Mar 25 12:58 linux.test.tar.lrz 160,970,956 Mar 25 12:55 linux.test.tar.xz

You will note that xz took 9 minutes and the resulting file was 60% larger than lrzip!

Next I prepared the tar file using lrzip -n. It took 19 seconds to halve the file for xz.

peter@tommyiv:/tmp$ LRZIP=NOCONFIG lrzip -n -o linux.test.tar.prep linux.test.tar linux.test.tar - Compression Ratio: 2.059. Average Compression Speed: 58.556MB/s. Total time: 00:00:19.90

peter@tommyiv:/tmp$ xz -vk linux.test.tar.prep linux.test.tar.prep (1/1) 100 % 95.0 MiB / 512.3 MiB = 0.185 1.7 MiB/s 4:52

537,148,983 Mar 25 13:15 linux.test.tar.prep 99,600,772 Mar 25 13:15 linux.test.tar.prep.xz

The effect of lrzip -n was to help xz, cut in half the time it needed to process the file. The bad news is that the prepared file is not usable and would need to be extracted using lrzip.