fhanau / Efficient-Compression-Tool

Fast and effective C++ file optimizer
Apache License 2.0
596 stars 41 forks source link

"Improve lodepng" dc1033a makes ECT much slower #4

Closed AlyoshaVasilieva closed 8 years ago

AlyoshaVasilieva commented 8 years ago

Testing compiles of dc1033a206c48df8157e61ad4e745b136673f5a6 vs 83a1fd9589be2b2ec8f1eb5d5912fb7ef4af42aa shows that at least for me dc1033a206c48df8157e61ad4e745b136673f5a6 is much slower.

Using GCC 5.3.0 msys2 to compile on Windows 7 x64 with "-flto -march=native -mtune=native" added to C(XX)FLAGS on an AVX2-capable Intel CPU, the resulting binary of dc1033a206c48df8157e61ad4e745b136673f5a6 is approx 2-3x slower than 83a1fd9589be2b2ec8f1eb5d5912fb7ef4af42aa. I am testing on large (7MB, ~8 megapixel) 24-bit PNGs.

Using commandline ect -9 --strict --mt-deflate=12 image.png. Hopefully can be fixed.

Thanks for making ECT.

fhanau commented 8 years ago

Thank you for the report, can you send me an example picture?

AlyoshaVasilieva commented 8 years ago

Cannot distribute test images. But can reproduce on this 4000x2000 gradient from Photoshop to a lesser extent: https://i.imgur.com/dktlptr.png

83a1fd9589be2b2ec8f1eb5d5912fb7ef4af42aa -9 speed: 150 seconds dc1033a206c48df8157e61ad4e745b136673f5a6 -9 speed: 191 seconds

83a1fd9589be2b2ec8f1eb5d5912fb7ef4af42aa -5 speed: 62 seconds dc1033a206c48df8157e61ad4e745b136673f5a6 -5 speed: 83 seconds

Both commits create identical output.

ghost commented 8 years ago

great improvement for 0.3: it's faster and it compresses better, and it's even better without dc1033a. this commit make ect slower (for same results) according to my tests.

fhanau commented 8 years ago

I tested the image. The commit does indeed decrease performance, but only on windows. On OS X, the performance stays the same(tested with gcc and clang). The error appears to be in the new lodepng_inflate function, which used to call lodepng's inflate implementation, but now uses zlib's.

fhanau commented 8 years ago

Should have the old performance now as the patch is partially reversed.

ghost commented 8 years ago

for ect -3 :

v0.3 without dc1033a 3,23 MB (3 395 870 Bytes) --> 43.165s 3,24 MB (3 400 691 Bytes) --> 41.652s (--mt-deflate)

v0.3 with e438159 3,23 MB (3 395 870 Bytes) --> 39.811s 3,24 MB (3 400 691 Bytes) --> 39.171s (--mt-deflate)


for ect -7 :

v0.3 without dc1033a 3,16 MB (3 315 624 Bytes) --> 132.491s 3,16 MB (3 315 624 Bytes) --> 132.398s (--mt-deflate)

v0.3 with e438159 3,16 MB (3 315 624 Bytes) --> 130.370s 3,16 MB (3 315 624 Bytes) --> 130.338s (--mt-deflate)