Faster Huffman decoding

richgel999 commented 2 years ago

I've realized that your proposal may also be quite useful for accelerating Huffman decoding, like this: https://fastcompression.blogspot.com/2015/10/huffman-revisited-part-4-multi-bytes.html

The modern LZ/Huffman decoders benefit from decoding 2 streams in parallel (on the same thread, i.e. a form of interleaved decoding). I realized that your proposal divides up the image into different, independently decodable streams, which is perfect.

If this works well enough, it would need one additional bit of metadata: whether or not the Deflate stream uses pixel-wise LZ with only RLE. These are standard Deflate streams, except all literals occur in groups of 3 or 4 bytes (excluding the PNG filter byte before each scanline), all matches are multiples of 3/4 bytes, and the only 3/4 match distances are allowed. Additionally, the maximum code size will be limited to 12 bits. The dynamic Huffman tables would have to be the same for each dynamic block.

With these constraints in place, with interleaved decoding, it should be possible to construct a very fast PNG decoder without using multithreading. Multithreading would just speed it up even more.

richgel999 commented 2 years ago

I have a prototype coming online today (note the FPNG time includes decompression time):

It's already 2.2-2.3x faster for decoding than stb_image.h, 10-15x faster for encoding, and ~4x faster for decoding than lodepng. (I suspect compared to libpng it'll also be significantly faster.) This is only decoding a single Huffman stream (one larger IDAT containing a single Huffman block). The stream must be constrained properly for this to work, otherwise the decoder bails with an error. (It can easily detect invalid streams.)

I will be needing to add an ancillary PNG marker for this, so my decoder knows that it can use the faster/simpler decoder.

richgel999 commented 2 years ago

Got faster compression and decompression working: https://twitter.com/richgel999/status/1473595056653279233

I'm writing a special chunk named "fdEC" that hints to our decompressor that the file's IDAT and Deflate Huffman tables and LZ matches follow our constraints.

DavidBuchanan314 / parallel-png-proposal

Faster Huffman decoding #5