Frommi / miniz_oxide

Rust replacement for miniz
MIT License
168 stars 48 forks source link

miniz_oxide doesn't validate HLIT #130

Closed fintelia closed 1 year ago

fintelia commented 1 year ago

The deflate spec requires that dynamic blocks contain at most 286 length/literal codes, despite the encoding allowing values up to 288 to be represented:

We can now define the format of the block:

5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286) 5 Bits: HDIST, # of Distance codes - 1 (1 - 32) 4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)

Unfortunately, this crate doesn't validate the requirement when decoding, so blocks with 287 or 288 length/literal codes are also accepted.

Example file (Make sure to extract the zip file and then feed bad.bin into miniz_oxide. GitHub doesn't allow uploading .bin files directly.)

oyvindln commented 1 year ago

What do zlib and C miniz when given this data? People often expect corrupted/non-conforming stuff to work regardless so if it e.g decompresses in zlib we might want to let it work as well.

fintelia commented 1 year ago

I'm not sure about C miniz, but zlib rejects this case and even clarified in an issue that the handling is intentional.

oyvindln commented 1 year ago

Okay, then we should also reject it.

oyvindln commented 1 year ago

When adding some code for this it looks like the test file has HLIT 2 and HDIST 23 (i.e 259 litlen codes 24 dist codes) so if it's supposed to fail it might be for something else. Does it decode fine with zlib? (miniz_oxide decodes it and seems to be raw deflate not zlib wrapped)

The code seems to already reject inputs with 32 distance codes (tested the bogus_data test in inflate/core.rs) though by checking some total number rather than HDIST directly (it was ported from miniz so some parts are still a tad cryptic). Would need to craft something with 287 or 288 litlen codes to see if that would be rejected too.

fintelia commented 1 year ago

I probably should have clarified that the bad.bin file is a zlib wrapped deflate stream. The first bytes of the file are 78 01 FD E0 01, of which the first two are a zlib header. Of the next couple:

   FD       E0       01
-------- -------- --------
11111101 11100000 00000001
       1                   BFINAL=Final Block
     10                    BTYPE=Dynamic Huffman Codes
11111                      HLIT=31+257=288
            00000          HDIST=0+1=1
         111             1 HCLEN=15+4=19

If it is helpful, this is the code I used to generate the header.

oyvindln commented 1 year ago

Ah it's probably correct, checked the source now and I was being a doofus and forgetting to change the filename in the test case I made. Will look into it with the right file later.