conor42 / fast-lzma2

Fast LZMA2 Library
BSD 3-Clause "New" or "Revised" License
292 stars 22 forks source link

No documentation? #7

Open ApexMods opened 4 years ago

ApexMods commented 4 years ago

Is there any documentation for flzma2, e.g. max dictionary size, fast bytes, cycles, compression levels etc.? I've been wading through the source to look for selectable parameters, saw that dictionary is limited to 1024MB, and apparently there are 3 compression strategies with 10 compression levels each, but I have no idea how to call them. Any help would be much appreciated.

ApexMods commented 4 years ago

From studying the source and initial trial and error, I've learned that -m0=flzma2:a(0-3) sets compression strategy, dictionary is indeed limited to 1024m (why?!?), fast bytes top out at 273, and matchfinder cycles go up to mc64. Compression levels seem to be set via standard -mx(0-9) switches, analysis levels -myx(0-9) remain original(?), same seems to be the case for literal context, literal position, and position bits. Will study source further for additional parameters to fiddle with.

On a side note: Wow, this thing is fast!!! Minimal memory use and very good compression.

Amazing job, Conor! Thank you for this gift to the world! =)

conor42 commented 4 years ago

Thanks for your comments :)

You must be referring to the 7-zip-zstd implementation. Yes there isn't much documentation that I recall. The Fast LZMA2 library documentation combined with the source for the 7-zip interface should cover everything. The 1024 Mb dictionary limit is a legacy of configuration code from Zstandard, which accepts only logarithmic sizes. I have updated the Fast LZMA2 library to fix this, but FL2_DICTSIZE_MAX still limits it to 1024 Mb. This needs to be fixed and tested.

ApexMods commented 4 years ago

I'm using your excellent library inside the p7zip dev branch (https://github.com/szcnick/p7zip). Had to wrangle with the source a bit, but finally got it to compile on macOS. I am truly amazed at the speed gains and almost laughably low memory requirements for multithreading. Beautiful.

ApexMods commented 4 years ago

Oh, and yes - the dictionary limit is the only thing holding it back. I'm maxing settings with 1G dictionary on 16 threads here on my machine, and it's barely using 7GB of memory. A dictionary size of 2GB would be perfect to compensate the slightly lower compression ratio (when compared to 1.5GB dictionary in memory-munching "slow" LZMA).

ApexMods commented 4 years ago

So, just for the fun of it, I compiled again with a modified 2GB dictionary limit (no other modifications). On a 20GB corpus, compression ratio was significantly increased, archive size went from 5.45GB (with 1GB dictionary) to 5.18GB (with 2GB). Compression time went up from 29 to 35 minutes, memory usage from 7GB to 14GB. Will try increasing radix cycles next. :)

conor42 commented 4 years ago

Nice :) In theory LZMA2 only supports a dictionary up to 1.5 Gb, but in practice the decoder can handle more than this. There may be decoders which balk at 2 Gb. I'll try to find out if raising it will cause any trouble.

ApexMods commented 4 years ago

Thanks, Conor. 👍🏻

With only 16 Gb RAM to test on, memory pressure was too big for extensive testing of the 2 GB dictionary (although decoding with e.g. standard p7zip 16.02 worked flawlessly). Running at 1.5 GB dictionary now (don't know how to set that limit in the source, as it's 2^n defined). Raising match finder cycles did not improve compression, so my max compression command currently looks like this:

7za a -mx -myx -ms1024t -mqs -m0=flzma2:a3:d1536m:fb273:mc64:mt16

Haven't figured out how exactly compression level (x), compression analysis (yx), and compression mode (a) influence each other in flzma2. Your source comments mention 3 compression modes, each with 10 different compression levels, and a plethora of other parameters which appear to be unalterable via command line switches.

While we're at it, what would the absolute fastest compression setting be? I'm currently using

7za a -mx1 -m0=flzma2:a0

(which performs north of 200GB/h, btw) but I have a feeling there's room for further improvement, as CPU load won't nearly reach 100% on that setting, and it's not disk speed bound, either.

conor42 commented 4 years ago

The 10 compression levels are made by combining a mode setting with a number of other settings, so there aren't 10 per mode. The problem with levels is the best combination of settings depends on the type of data, so results may not be consistent when comparing 1-10.

Fast compression will probably never compare well with hashing strategies because the algorithm has no advantage on small dictionaries. Also the initial step at depth 0 is single threaded.

ApexMods commented 4 years ago

Still unclear on some things. Does mode (a0-3) override level (x0-10), or is it the other way around? Does analysis level (yx) have any effect at all? Getting inconclusive results here, so a bit confused.