ckolivas / lrzip

Long Range Zip
http://lrzip.kolivas.org
GNU General Public License v2.0
618 stars 76 forks source link

Consider dropping alternate compression methods #187

Closed pete4abw closed 3 years ago

pete4abw commented 3 years ago

After the discussion on zstd #61 support, it dawned on me there will always be some new "better" compressor or one that becomes fashionable.

gzip is gzip, bzip is bzip, zpaq is zpaq, lzo is lzo, even lzma is lzma. lrzip files using those compression methods are incompatible with the native applications. Since lzma has been shown to be overall better than all the others -- either in terms of time and compression ratio, what benefit is there to keeping it? rzip+lzma is all that is needed.

As you mentioned, @ckolivas , no one is using the library.

One program, one purpose.

I suggest we prune down the codebase and dump all the alternative compression methods and the library. Pure lrzip will be the result.

JM2C

MaxPower85 commented 3 years ago

Hello :)

Keka (one of the most popular macOS archivers) started using LRZIP... although it's not very configurable at the moment, but I suggested making options for LRZIP more configurable.

So I would suggest not changing the format in some way that could break compatibility between versions of LRZIP... since it could maybe discourage the adoption of the format if users thought that the compatibility could break.

I did some tests with LRZIP compression with some big PSD file and I was very impressed with how much it can managed to compress it, especially when using the ZPAQ option... so I would like if LRZIP would see wide adoption... and if some popular archiver for Windows could also start supporting it, I think many would probably see it as one of their favorite formats...

But the format has to be very standardized and which compression options are supported by LRZIP itself has to be clearly limited to only specific formats (and if someone wants to use another compression format, they can pass the file from LRZIP to another compressor which could add it's one file header for its own format and another extension to it, to let users know that it uses some other format), so if someone gets a .lrz archive they could uncompress it with LRZIP they have and so there wouldn't be incompatible versions of LRZIP where one supports one format and another one doesn't.

For formats that have already been added, I think that if you wanted to drop support for them, that it should be done in such a way to maintain full compatibility with LRZIP archives that were compressed using those formats if someone combined LRZIP with some external binaries to decompress from those formats...

If the "out of the box" support for GZIP, BZIP2 and ZPAQ was dropped, I think that LRZIP should retain the full support for LRZIP archives in those formats and just add some options like --external-bzip2, --external-zpaq, --external-gzip and --external-lzo... it could also do the same for LZMA and just and make that external too... and if the user tries decompressing from archives that use those compression formats, it should show a message like "This is a ZPAQ compressed LRZIP archive. Please use --external-zpaq and specify the ZPAQ binary".

It could also offer an option to the user to just specify external binaries for ZPAQ, GZIP, BZIP2 and LZO when compiling LRZIP, so LRZIP wouldn't have to ask the user which external binaries to use.

For wide adoption of LRZIP, it is very important that the format it uses is like a strict standard that and that if one archiver adds support for LRZIP, that it could uncompress LRZIP archives from any other archiver without issues.

If there are new ideas to make changes to the format, the support for the old format should be maintained and the new format could be added as an additional format, if some changes were needed... there could maybe be LRZIP A, LRZIP B, LRZIP C and so on if really needed... and those should be clearly distinguished from a file header.

More compression formats could also be added by using external binaries, but it should be in some standardized way. It should not be considered part of the existing format. The existing format with all of its compression options it supports could be the "LRZIP A" and then for "LRZIP B" additional formats could be specified, but it should be done with both the forward and the backward compatibility in mind... so that if someone just compiled an older version of the binary with support for "LRZIP A" and they try uncompressing a "LRZIP B" archive that uses some other compression format, LRZIP could just ask the user to specify an external binary for that compression format.

The format that is used now could just become "LRZIP A" format and the stated goal of such standardization could be to let the users know that they can expect the current versions to b fully compatible with "LRZIP A" format created by any future versions of LRZIP and for any future versions to fully support the current format... and that if any changes were introduced later, that it would be a separate standard, while keeping the current one also.

pete4abw commented 3 years ago

It's important to remember that lrzip is pre-release in terms of versions. This means that backward compatibility is not assured. Features may come, go, be modified. For example, my fork, which is in version 0.7, has updated SDKs for lzma and zpaq and adds filtering, which will make the file format incompatible with this version. @ckolivas made efforts to support files going back to 0.4 for decompression only. Interestingly, with the exception of lzma, which has a 5-byte descriptor in the header, all other methods are defined in each compressed block of data. You can see this doing lrzip -vvi file.lrz. This is because some blocks may be incompressible and will have a compression type of none', whereas the other blocks may havelzma' etc. My fork employs variable block sizes for zpaq which is decoded on decompression.

IDK Kaka, but if they are using the lrzip library, they do so at their own peril right now.

pete4abw commented 3 years ago

Just closing.