Closed pete4abw closed 3 years ago
Size | Name | Time | Comp Index | Time Index | Overall Index | Rank |
---|---|---|---|---|---|---|
183,084,179 | LRZIP_L3 | 00:24.920 | 86.99% | 15.13% | 51.06% | 1 |
178,224,370 | LRZIP_L4 | 00:29.610 | 84.68% | 17.97% | 51.33% | 2 |
192,488,812 | LRZIP_L2 | 00:21.330 | 91.46% | 12.95% | 52.20% | 3 |
210,463,000 | LRZIP_L1 | 00:19.620 | 100.00% | 11.91% | 55.95% | 4 |
150,801,305 | LRZIP_L5 | 01:54.950 | 71.65% | 69.77% | 70.71% | 5 |
149,548,952 | LRZIP_L6 | 01:58.610 | 71.06% | 71.99% | 71.53% | 6 |
145,853,453 | LRZIP_L7 | 02:14.720 | 69.30% | 81.77% | 75.54% | 7 |
145,288,873 | LRZIP_L8 | 02:40.810 | 69.03% | 97.61% | 83.32% | 8 |
145,105,437 | LRZIP_L9 | 02:44.750 | 68.95% | 100.00% | 84.47% | 9 |
Here it's clear that lrzip results can be split into two sections. Levels 1-4 had very fast times, between 19 and 29 seconds. Levels 5-9 had slower times between 1:54 and 2:44. In the first group, even though the time index was between 11.9% and 17.9%, the compression index was between 84.7% and 100.0%. So, with level 4, you get a 15 point improvement in compression with only a 6 point drop in time. A good trade.
In the second group, the time index varies by 30 points, 70-100, yet the compression index only varies slightly, between 72 and 69. Here, you only get a 3 point improvement in compression between levels 5 and 9, but with a time penalty of 30 points! Between levels 6 and 7 there is a 1.5 point improvement in compression, but a 10 point drop in time. This is why level 6 has a higher ranking than 7.
This tells me that if speed is important, choose level 4. If best compression for time is important, choose levels 6 or 7.
Here's a little batch program that can run. If running as root, uncomment the drop_caches line. This will flush memory caches which, along with sync, will give a truer speed comparison.
#!/bin/sh
# lrzip speed test
# if running as root, uncomment drop_caches line
usage() {
echo "LRZIP Speed Test"
echo "usage: $0 filename"
exit 1
}
[ -z $1 ] && usage
for i in 1 2 3 4 5 6 7 8 9
do
sync
sleep 1
# echo 3 >/proc/sys/vm/drop_caches
# sleep 1
lrzip -L$i -S.$i.lrz $1
[ $? -ne 0 ] && break
done
exit 0
Just closing
I, like so many, tend to obsess over how well lrzip compresses and am always looking for ways to compare. And, there are two major benchmarks for comparison.
But how do you get an overall picture of the benefit of one compression method over another? How to you assess whether additional compression is worth additional time?
So, I decided on a (perhaps) mathematically shaky method of creating a compression index and time index for different compression methods and combining the two.
The long form of this analysis is here. [Edited to point to new version in main branch]
But here is a snippet which explains the methodology.
...an attempt is made to create an overall index and Rank for each method. For this the Compression Index and Time Index are ADDED and then divided by 2 to make the Overall Index scale to 100%. The lower the number the better!
The compression index is computed by comparing the size of a compressed file to the maximum (worst) size of all methods.
MYSIZE/MAX(ALLSIZES)
and the time to compress compared to the maximum (slowest) time to compressMYTIME/MAX(ALLTIMES)
. The worst compression ratio will have an index of 100%. The slowest time to compress will have an index of 100%. All other compression and time indeces will be relative to the best compression and slowest time.Example
Compression size: 100
Worst Compression size: 120
Compression Index: 100/120 = 83.33% (percent relative to largest compressed size)
Time to Compress: 60 seconds
Slowest Time to Compress: 320 seconds
Time Index: 60/320 = 18.75% (percent relative to the slowest compression time)
Combine index: (83.33+18.75)/2 = 51.04 This number can be compared to all others in the set.
Highlights of 11 different compression methods
(top 3 here) The differences were small between the top three in compression but the associated times differed by more than double!
If we Index these results, comparing one to each other:
Blending time and compression, LRZIP using LZMA comes out on top with an overall index score of 72%, vs 86% and 97% for the ZPAQ variances. Even though it had the worst compression ratio of the three, it had the best time by far, hence the better overall score.
How do you use this?
There is no best way. Obviously for smaller files, the important benchmark is time. For larger files, the important criteria is compression. Text files will always compress faster than binary. With storage costs decreasing, SDRAM becoming faster and faster, processing power ever-increasing, individual needs and requirements will vary. Hopefully, this may help.