IridiumIO / CompactGUI

Transparently compress active games and programs using Windows 10/11 APIs
GNU General Public License v3.0
4.72k stars 224 forks source link

Suggestion: Add average compression to readme and/or wiki #358

Closed MHLoppy closed 8 months ago

MHLoppy commented 9 months ago

I think that including the average reported compression rates would be helpful for at-a-glance information to help make informed decisions about what algorithm is worth using if data for that specific game/program isn't available.

I calculated the average % savings reported in the spreadsheet as follows:

Algorithm % Savings
XPRESS4K 25.29%
XPRESS8K 26.80%
XPRESS16K 26.98%
LZX: 30.84%

You might have more up to date data to calculate with, but I imagine the results would be similar overall.


Edit: Using the filtered data from the followup -- where only results that include data for all four algorithms are included -- has the following results instead which are probably better at summarizing the relevant data (though also imperfect!):

Algorithm % Savings
XPRESS4K 22.53%
XPRESS8K 24.96%
XPRESS16K 26.91%
LZX: 29.58%
Iridium-IO commented 8 months ago

It's actually not so simple. I've got over 50,000 user-submitted compression results (31993 if you just count version 3) and the compression varies drastically depending on the program/game you compress. If you use something like Compactor, that can actually give you an estimate because it does a fast partial compression over the data first to guess how good it will compress (for example if a file is 5GB, it will randomly sample 5KB of data) and uses that to build an estimate.

I was in the process of doing something similar last year before I got distracted and had to put this project aside.

MHLoppy commented 8 months ago

I've used the utility quite a lot (thanks for making it!), so am familiar with the wildly different compression ratios depending on what's being compressed.

Maybe we were coming at it from different perspectives - I wasn't trying to answer "how much will this unknown folder compress", but rather "is it worth using one of the heavier compression algorithms on this folder". By visualizing AppIDs that have a result for more than one algorithm, we can observe that the ratios between the differences in compression efficiency are similar irrespective of the absolute amount of compression (with only a few outliers).

The key point being that when one algorithm can compress only a little, the next-heaviest algorithm is extremely unlikely to gain a substantial amount more. When an algorithm does quite well, the next-heaviest algorithm will have a higher average (absolute) gain. Etc.

image

Thus, the average values do provide actionable insight on whether an otherwise-unknown folder is likely to be worth compressing using a heavier algorithm.

In any case, if you are planning to add in a "pre-compression check" as you've described, then this would be much less important and we can get real information to make a decision for the files in a specific unknown folder instead!

Iridium-IO commented 7 months ago

@MHLoppy Oh, I see what you're saying now! That's quite the impressive chart too