aonez / Keka

The macOS & iOS file archiver
https://www.keka.io
4.59k stars 232 forks source link

New formats support #84

Open aonez opened 6 years ago

aonez commented 6 years ago

Some compression formats that could be added. Being in the list does not mean they will be added, just taked into account:

Just extraction:

dezzeus commented 6 years ago

It would be nice to also have Zstandard.

MaxPower85 commented 6 years ago

Since you have lrzip on the list, add zpaq too since lrzip can optionally use zpaq for it's 2nd stage... but you can use zpaq independently too.

You can also add rzip... lrzip is similar, but it's not the same format.

Maybe add Apple's lzfse too... but I'm not sure did Apple mean it to be used on it's own as a format for archives (or did they mean it to just be used within some other formats), since I can't find info about what kind of extension could be used for archives compressed with lzfse... although you can compress some file or a tar archive with lzfse and it seems pretty good for a format that doesn't use multithreading... and people are saying that lzfse is supposed to be energy efficient... but even the file command on Sierra doesn't seem to recognize what type of archive is that if you compress files with lzfse...

https://github.com/lzfse/lzfse

If you look at a file compressed with lzfse in some HexEditor, it says "bvx2" at the begining... and here's a clue about what that means: https://github.com/lzfse/lzfse/blob/497c5c176732769abf36ccc71a31c06bad93a84d/src/lzfse_internal.h#L276-L281

So it doesn't seem that it would be difficult to recognize lzfse compressed archives... but the question is did Apple intend for it to be used just on it's own like bzip2 or gzip.

It can also be used for compressed .dmg images when you create a compressed .dmg with hdiutil and you use -format ULFO like hdiutil create -volname vol_name -srcfolder source_folder -ov -format ULFO new_dmg_image.dmg

I'm reading that 7z beta for Windows has added support for .dmg images that use lzfse compression... but 7z for macOS or Linux doesn't seem to recognize them yet.

yetisyny commented 6 years ago

The .WIM format (Windows Imaging Format) has been supported for both compression and decompression by 7-Zip for Windows for several years. Since it is part of the relatively short list of filetypes 7-Zip for Windows supports not only reading but also writing to, even in the GUI, it ought to be included in Keka for feature parity with the Windows version of 7-Zip. There is also already a library and utility for the .WIM format that is cross-platform, at https://wimlib.net/, although this library is under GNU GPL version 3 so you cannot use it legally unless you start using that license too which I doubt you would want to do.

So using the 7-Zip implementation would probably work better license-wise. And actually the 7-Zip implementation for the .WIM format is already included in the p7zip ports to UNIX-based operating systems (including macOS, Linux, etc.). So directly using p7zip is probably the easiest way to do this, in fact you already use p7zip for other things. And as far as the virtues of the .WIM format or why anyone would want to use it, it is a file-based imaging format that can archive advanced filesystem features and can be used with several different compression algorithms, and is in widespread use, especially by Microsoft which uses it for everything. Plus it is the ONLY compression format supported by the GUI and command-line versions 7-Zip for Windows which Keka does not also support, so adding it would bring Keka to feature parity with 7-Zip regarding supported formats to compress to, and of course it is also there in p7zip too. The other formats 7-Zip advertises on its website as being able to compress besides .WIM are 7Z, XZ, BZIP2, GZIP, TAR, ZIP, and you already support all of those! (I think 7-Zip also supports maybe a few more such as ISO but those are not mentioned there, anyway you already support ISO too.)

aonez commented 6 years ago

this library is under GNU GPL version 3 so you cannot use it legally

If that is true, then lbzip2 also can't be bundled within Keka.

d235j commented 6 years ago

Regarding bundling, please see https://www.gnu.org/licenses/gpl-faq.en.html#MereAggregation. If the proprietary components are not linking to the GPL components, then you should be OK; however, you need to provide source code to the GPL components.

aonez commented 6 years ago

Thanks @d235j, you're right. Already started pushing the GPL code here 😊

magitk commented 6 years ago

+1 for zpaq

dh1337 commented 5 years ago

any news on brotli?

gingerbeardman commented 5 years ago

@denishamann1337 out of interest what is your use case for Brotli? What would it achieve that other existing formats or schemes can not?

p2k commented 5 years ago

+1 for zpaq

It is the best pack format I know combining deduplication and a strong compression that outperforms every competitor. It actually allows multiple versions of the same file(s) so it is suitable for incremental backups. Needless to say it offers industrial standard encryption.

Having a GUI for zpaq would be a bliss, but is considerably harder to do than for all the other formats since it has some unique features (like the aforementioned multi-version capability).

More information on zpaq: http://mattmahoney.net/dc/zpaq.html

dh1337 commented 5 years ago

@denishamann1337 out of interest what is your use case for Brotli? What would it achieve that other existing formats or schemes can not?

I read some interesting benchmarks lately (e.g.: http://www.instantshift.com/2018/03/02/gzip-vs-brotli-compression/) On the same brotli is supported by the 7z extension (see: https://github.com/mcmilk/7-Zip-zstd) and I would love to have the same "compatbility" in Keka compared to 7z on windows. I feel like having better performance for some usecases and being supported next to gzip in all major browsers makes it a defacto standard (see: https://caniuse.com/#search=brotli).

gingerbeardman commented 5 years ago

@p2k is this not a problem?

zpaq is for user-level backups. Do not use it to back up the operating system or any software that requires a password to install. zpaq saves regular files and directories, last-modified dates (to the nearest second), and (optionally) Windows attributes or Linux permissions. It does not follow or save symbolic links or junctions. It unknowingly follows hard links. It does not save owner or group IDs, ACLs, extended attributes, the registry, or special file types like devices, sockets, or named pipes.

p2k commented 5 years ago

@gingerbeardman not for me. I don't use it to backup an operating system or things like an .app bundle on macOS (which often contain symlinks). But if I wanted to, I could always resort to piping a tar archive to zpaq.

It might be an idea to do a pre-check when archiving stuff with zpaq, though, and warn the user. That's a good point.

aonez commented 5 years ago

@denishamann1337 I checked again and still Brotli does not even have a magic number. So it is still focused in data stream over the network, for use in browsers. That is why it is compared with gzip, also used in browsers.

That said, as it is fairly easy to add support for Brotli, here a test build: https://github.com/aonez/Keka/releases/tag/dev-test-builds

dh1337 commented 5 years ago

@aonez I see, I assumed the magic number was existant by now. Thank for the effort for checking :)

jamie-arcc commented 5 years ago

+1 for Zstd and zpaq!

systemcrash commented 5 years ago

+1 for Zstd / Zstandard

dual BSD and GPLv2 licensed C library

aonez commented 5 years ago

@jamie-arcc and @systemcrash check out the latest v1.2.0-dev.3494 test build, it has Zstandard support 😊

systemcrash commented 5 years ago

First thoughts on https://github.com/aonez/Keka/releases/tag/v1.2.0-dev.3417

What Zstd compression numbers correspond to the slider? (Store, Fastest, Fast...) - could this info be hinted in the GUI?

-# : # compression level (1-19, default: 3) Store = 1 Fastest = 4 Fast = 7 Normal = 10 Slow = 14 Slowest = 19 ?

aonez commented 5 years ago

@systemcrash it goes 1, 2, 3, 6, 8 and 9. The method (level) slider should be enhanced to adapt #112. Most cases use 0-9, this case and also RAR (0-5) are different. Also a dynamic slider is much needed for a finer selection.

systemcrash commented 5 years ago

Forget everything above 15 - tradeoffs are rarely worth it for the diminishing gains above level 15.

why make things static? Look at the library range, then draw the slider based on this. Now 6 stops on the slider, (int)floor(15/6 1) (int)floor(15/6 2) (int)floor(15/6 3) (int)floor(15/6 4) (int)floor(15/6 5) (int)floor(15/6 6)

You closed the source because of all the copy-cats in the App Store, yah?

aonez commented 5 years ago

Made a quick test and 15-19 resulted in 13% more savings. So If the next dev build does not have the dynamic slider yet, it will use 1, 2, 3, 4, 15 and 19. So far I'm impressed with Zstd, although 7z still has better speed/ratio.

Screen Shot 2019-06-27 at 11 24 37

You closed the source because of all the copy-cats in the App Store, yah?

It was the trigger, yep.

systemcrash commented 5 years ago

7z is a format - not an algorithm. Which algo was used LZMA?

gingerbeardman commented 5 years ago

How is the support for Zstd across platforms?

akrabu commented 4 years ago

Made a quick test and 15-19 resulted in 13% more savings. So If the next dev build does not have the dynamic slider yet, it will use 1, 2, 3, 4, 15 and 19. So far I'm impressed with Zstd, although 7z still has better speed/ratio.

Screen Shot 2019-06-27 at 11 24 37

For what it's worth, I ran the latest build (1.2.0.3542) at the highest compression level for Zstd on an old Outlook PST file I was intending to archive, and achieved the following:

Original: 7.34GB Brotli: 5.84GB (Keka, slowest method) Zstd: 5.11GB (Keka, slowest method) 7z: 4.86GB (Keka, slowest method) ZPAQ: 4.84GB (zpaq a mailbox.pst.zpaq mailbox.pst -m5) XZ: 4.53GB (xz -e --lzma2=preset=9,dict=1610612736,nice=273 --memory=90% mailbox.pst) Zstd: 4.46GB (zstd -22 --ultra --long=31 --single-thread mailbox.pst) Lrzip (LZMA): 4.44GB (lrzip --lzma -L 9 -U mailbox.pst) Lrzip (ZPAQ): 4.34GB (lrzip -z -L 9 -U mailbox.pst)

The "long range mode" in Zstd is rather impressive. The only thing that seems to beat it is Lrzip (aka Long Range ZIP, not LZIP), which takes significantly longer (and the ZPAQ method takes the same amount of time to DEcompress as well - in this case, 10 hours).

With that in mind, could we...

Apologies if I'm over-complicating the UI, but I thought I'd throw it out there. I just really love using Zstd's long range option for very large files with redundant data (archiving mailboxes, for instance). It works WAY faster than Lrzip, which tries to do something somewhat similar. Zstd appears to use a window of 2147483648 bytes (~2GB) to look for patterns, at least on this specific test file, which isn't quite as effective as Lrzip's "sliding window" but it sure performs faster.

Note: Zstd will throw an error during testing or extraction if you don't use a large enough window for an archive that was compressed with a larger than normal window. Example:

akrabu-macbook-air:~ akrabu$ zstd --test mailbox.pst.zst mailbox.pst.zst : Decoding error (36) : Frame requires too much memory for decoding mailbox.pst.zst : Window size larger than maximum : 2147483648 > 134217728 mailbox.pst.zst : Use --long=31 or --memory=2048MB

This also means that, presently, Keka will fail to extract files made with large windows:

Screen Shot 2019-09-16 at 2 34 52 PM

Ps. I also tried Brotli's --large-window option, but it was unremarkable in this case, and resulted in a size comparable to what Keka's max accomplished already.

MaxPower85 commented 3 years ago
  • [x] zpaq (LRZIP as suggested by @MaxPower85) -> 1.2.0r 3806+ LRZIP in slow method

This needs a correction...

LRZIP can use various compression formats on parts of the archive, but it's a separate format... it can use ZPAQ, but ZPAQ is its own archiving format which can be quite useful to have on its own too, especially if files that share a lot of the same data are added to an existing archive later, since it does not compress them again and just reuses the data that was the same... the archive can also be "rolled back" to retrieve an earlier version of some file.

gingerbeardman commented 2 years ago

DAR (Disk ARchive) https://dar.sourceforge.io

akrabu commented 2 years ago

Oh I'd love to have DAR support. It can do SO much. I just thought it might be too much to support in such a little Keka window, you know? It's SO configurable, though I guess basic support would be fine.

I use it to make 50GB archives with par2 files and burn them all to Blu-rays to back up my pictures.

gingerbeardman commented 1 year ago

Another odd LZH, from Atari ST

http://discmaster.textfiles.com/file/11869/www.umich.edu.archive.2014.03.zip/www.umich.edu/~archive/atari/Games/Puzzle/nanjin11.lzh

akrabu commented 1 year ago

Another odd LZH, from Atari ST

http://discmaster.textfiles.com/file/11869/www.umich.edu.archive.2014.03.zip/www.umich.edu/~archive/atari/Games/Puzzle/nanjin11.lzh

Wow. That brings back memories. Been a long time since I came across an LHA/LZH archive!

gingerbeardman commented 1 year ago

Wow. That brings back memories. Been a long time since I came across an LHA/LZH archive!

I dive into old software often and this was also a wow moment for me @akrabu !

lzh are encountered frequently on classic Mac, especially with Japanese software. Atari ST was my first computer!

Sytten commented 11 months ago

The latest version doesn't seem to be able to decompress zip with zstd compression.

aonez commented 11 months ago

@Sytten can you open a new issue including a test file that meet that conditions?

gingerbeardman commented 4 months ago

https://web.archive.org/web/20040318005247/http://www.mars.dti.ne.jp:80/~odaki/sounds/crutch.mdz

.mdz is a zip file containing a "MOD" (module music file, but can be many types: .mod, .xm, .it, etc)

Simple rename to zip and extract is a workaround.

aonez commented 4 months ago

Is a ZIP indeed. Can extract it with Keka (using the alternate option or the extract action in the contextual menu). Will add it to the supported formats :)

gingerbeardman commented 2 months ago

https://delta-skins.github.io/nds.html

gingerbeardman commented 3 weeks ago