Closed jrmuizel closed 8 years ago
How many times of files did you test? libdeflate uses an entirely different algorithm at level 8 and above; it will provide a better compression ratio than the faster algorithm (which is similar to zlib's algorithm) most of the time, but it will, inevitably, be possible to find an input where the opposite is true, if only slightly. There is some room for improvement in this area though.
libdeflate did better on some of the other files that I tried. Interestingly enough, zlib-gzip -8 also beats libdeflate-gzip -8: 15113 vs 15901
Well, as I mentioned, libdeflate switches to a different algorithm at level 8 and above, and that is causing the difference for this particular file. At level 7, libdeflate uses an algorithm similar to zlib, so the difference disappears (and libdeflate actually does slightly better than zlib).
Apparently, this file is from the test data set checked into the repository for snappy. For some context, here's some more gzip compressed size stats from that data set (minimum size of each file bolded):
File | zlib -7 | libdeflate -7 | zlib -9 | libdeflate -9 | libdeflate -12 |
---|---|---|---|---|---|
alice29.txt | 54278 | 54273 | 54191 | 52076 | 51941 |
asyoulik.txt | 48863 | 48880 | 48829 | 46856 | 46729 |
baddata1.snappy | 22936 | 23002 | 22936 | 22846 | 22844 |
baddata2.snappy | 23016 | 23081 | 23016 | 22925 | 22917 |
baddata3.snappy | 23721 | 23786 | 23721 | 23629 | 23623 |
fireworks.jpeg | 122942 | 123078 | 122942 | 123046 | 123043 |
geo.protodata | 15226 | 15077 | 15113 | 15824 | 15338 |
html | 13692 | 13740 | 13589 | 13773 | 13261 |
html_x_4 | 53353 | 53392 | 52934 | 53938 | 51799 |
kppkn.gtb | 38315 | 38270 | 37633 | 35027 | 34515 |
lcet10.txt | 144585 | 144526 | 144429 | 138876 | 138337 |
paper-100k.pdf | 81237 | 81259 | 81211 | 81099 | 81052 |
plrabn12.txt | 194610 | 194883 | 194277 | 185762 | 185711 |
urls.10K | 222217 | 222151 | 220207 | 217978 | 213738 |
As you can see, libdeflate's strongest compression mode usually beats zlib, sometimes even by a significant margin, but some files such as geo.protodata are outliers.
There are always going to be some outliers but I have some ideas in mind that might improve the situation.
I thank for your project! I wanted to ask one more question. How it is possible to improve compression?
libdeflate | 7-zip 16.00 | Ratoi | |
---|---|---|---|
gxkkrnb.png | 605 371 bytes | 595 885 bytes | -1,5% |
7z -tgzip -mx=9 -mfb=258 -mpass=2
As with the other files that have been pointed out, a better block splitting algorithm is needed.
you plan to introduce this function in project?
I have released libdeflate v0.2, which includes an improved block splitting algorithm as well as some other improvements. As a result, libdeflate now compresses more, or is faster, than before on most files.
Here's an updated version of the table I posted previously, with "gxkkrnb.png" added:
File | zlib -6 | zlib -9 | libdeflate -6 | libdeflate -9 | libdeflate -12 |
---|---|---|---|---|---|
alice29.txt | 54416 | 54182 | 54408 | 51753 | 51560 |
asyoulik.txt | 48909 | 48790 | 48771 | 46654 | 46444 |
baddata1.snappy | 22953 | 22953 | 22976 | 22789 | 22783 |
baddata2.snappy | 23035 | 23035 | 23059 | 22869 | 22863 |
baddata3.snappy | 23730 | 23730 | 23769 | 23569 | 23566 |
fireworks.jpeg | 122835 | 122835 | 123010 | 122974 | 122974 |
geo.protodata | 15143 | 14986 | 15237 | 15306 | 14887 |
html | 13711 | 13570 | 13863 | 13667 | 13236 |
html_x_4 | 53379 | 52750 | 54237 | 53392 | 51474 |
kppkn.gtb | 38763 | 37665 | 38802 | 34840 | 34324 |
lcet10.txt | 144916 | 144451 | 144801 | 138014 | 137602 |
paper-100k.pdf | 81324 | 81274 | 81033 | 80933 | 80799 |
plrabn12.txt | 195273 | 194344 | 195510 | 185414 | 184492 |
urls.10K | 222625 | 218920 | 223338 | 216715 | 212447 |
gxkkrnb.png | 706109 | 701174 | 665418 | 604853 | 596797 |
For now I did not include 7-Zip or Zopfli since libdeflate is not currently intended to consistently match the compression ratio of programs that use the very slow approach of fully evaluating multiple block splits. libdeflate should usually get very close, though, while being much faster.
libdeflate vs ECT
enwik8
Compressor | File Size | Time |
---|---|---|
ect -2 | 35 350 269 byte | 23.478s |
gzip -12 | 35 100 553 byte | 22.608s |
ect -3 | 35 013 879 byte | 30.082s |
ect -4 | 34 967 607 byte | 35.125s |
tell me, is it possible to increase the compression ratio to the level of ect?
No, as I have already noted, despite the recent improvements, libdeflate is still not intended to fully match the compression ratio of Zopfli or Zopfli-derived programs. Also, if you have further questions or enhancement requests, please create a new Github issue. This one is closed.
I get 15113 bytes from zlib and 15338 bytes from libdeflate on this uncompressed version of this file: https://github.com/quixdb/squash-benchmark/blob/master/geo.protodata.xz