Use fpnge PNG encoder instead of stb_image

AuburnSounds / gamut

Image encoding and decoding library for D. Detailed layout control. Experimental codec QOIX.

Boost Software License 1.0

41 stars 2 forks source link

Use fpnge PNG encoder instead of stb_image #33

Closed p0nce closed 1 year ago

p0nce commented 1 year ago

png.cpp compression compared to stb_image_write.h: 12-19x faster with roughly 5-11% avg. smaller files.

also: opens avenue for fast PNG loading, if generated by fpng, but then we would need two PNG decoders
can it do 16-bit encode? seems like it's only rgb8 and rgba8. 16-bit encode is more important than speed, since right now only QOIX supports 16-bit encodes

veluca93 commented 1 year ago

You could also consider https://github.com/veluca93/fpnge/, which can do 16-bit encode and is also pretty fast ;) (disclaimer: I'm the author)

p0nce commented 1 year ago

That's pretty cool, I would definitely like to have 1 and 2 channnels encodes somehow. I shall implement AVX2 completely in intel-intrinsics for that, which woud make it fast for Apple Silicon too. (EDIT: I mean, if this is the road taken)

For the kind of requirements we have (in the audio plugin world):

PNG encoding size can be important, but we can use any encoder at that stage
load times is very important
load temp memory is kinda important
complexity is kinda important since D translations are easier to integrate (though that is changing with ImportC, eventually it will be better in a D context to simply leave things as C)

p0nce commented 1 year ago

16-bit PNG encode definately needed to move out of legacy 8-bit elevation map in Dplug (else one converts the 8-PNG to 10-bit QOIX for better speed/ratio, but can't go back to newer 16-bit PNG (then to .xcf) to become the new asset reference.

p0nce commented 1 year ago

translating fpnge isn't easy but we need to push through

p0nce commented 1 year ago

Now translated, 8-bit output seems to work. (EDIT: no)

[ ] Compare speed against stb_image_write for 8b
[ ] Compare memory consumption against stb_image_write for 8b
[ ] Compare stack usage against previous
[ ] Compare build time against previous
[ ] PNG output is buggy in both 8-bit and 16-bit, but not for all 8-bit images. Translation didn't work well :(

p0nce commented 1 year ago

To repro that diff, decode and rencode that PNG with gamut. crop crop-out

p0nce commented 1 year ago

Output of pngcheck:

File: crop-out.png (4501 bytes)
  chunk IHDR at offset 0x0000c, length 13
    434 x 63 image, 32-bit RGB+alpha, non-interlaced
  chunk IDAT at offset 0x00025, length 4444
    zlib: deflated, 256-byte window, superfast compression
    row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
      1 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0
    zlib: inflate error = -3 (data error)
 (56 out of 63)
ERRORS DETECTED in crop-out.png

On https://www.nayuki.io/page/png-file-chunk-inspector it says "Adler-32 mismatch"

p0nce commented 1 year ago

Not easy, original fpnge encodes correctly. When the chunk is decoded with std.zlib, the error returned is indeed Z_DATA_ERROR

p0nce commented 1 year ago

So it was way easier to add 16-bit support to stb_image_write.h translation than fix our translation of fpnge.

So I did a few experiments even on compression on the 16-bit test suite:

Varying compression level

filter -1 q8 (baseline)
TOTAL  decode mpps   encode mpps      bit-per-pixel
             62.71          8.74           22.62820

filter -1 q7
TOTAL  decode mpps   encode mpps      bit-per-pixel
             62.16          9.11           22.66241

filter -1 q6
TOTAL  decode mpps   encode mpps      bit-per-pixel
             63.75          9.87           22.70483

filter -1 q5
TOTAL  decode mpps   encode mpps      bit-per-pixel
             63.22         10.72           22.76041

Choose predictor by zipping the residual on each line (without the preceding line, mind you).

TOTAL  decode mpps   encode mpps      bit-per-pixel
            114.49          1.13           22.28941    // a 10x time reduction for just 2% size

Would be probably smallest to keep the last two scanlines of residual, run the zip evaluation on those two, and have a dyamic programming alg to find all the filters for the image. But can't be bothered to do that.

p0nce commented 1 year ago

Done and happy