facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.68k stars 2.1k forks source link

Extra bits on block are not reported as error. #3089

Open klauspost opened 2 years ago

klauspost commented 2 years ago

Describe the bug

Extra bits on stream does not get reported, even after #1598

To Reproduce

Decompress: fea5d210d01530bfeb0130452ef432734b23e744.zst.gz Execute zstd -d fea5d210d01530bfeb0130452ef432734b23e744.zst

Expected behavior

The sample contains 15 extra bits when the sequence has been decoded. These should be reported as an error.

zstd with debugging reports:

zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue (srcSize:17)
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: case ZSTDds_decompressBlock
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: case bt_compressed
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressBlock_internal (size : 17)
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeLiteralsBlock
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeLiteralsBlock : cSize=6, nbLiterals=5
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeSeqHeaders
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences_body
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=44 using 6 bits
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=0 using 5 bits
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=18 using 6 bits
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=1, matchL=62, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 63
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=2, matchL=62, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 64
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=2, matchL=33, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 35
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences_body: after decode loop, remaining nbSeq : 0
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from block : 162
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from frame : 162
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from frame : 162
zstd-git\lib\decompress\zstd_decompress.c: stage zdss_read
zstd-git\lib\decompress\zstd_decompress.c: neededInSize = 0
fea5d210d01530bfeb0130452ef432734b23e744.zst: 162 bytes

Also tested with v1.5.2 release.

It seems we "agree" on decompression:

2022/03/09 12:13:06 Literals: 5 hash: 51174239790196767 and 3 sequences.
2022/03/09 12:13:06 Seq 0 Litlen: 1 mo: 1 (abs) ml: 62 . bits remain: 49
2022/03/09 12:13:06 Seq 1 Litlen: 2 mo: 1 (abs) ml: 62 . bits remain: 31
2022/03/09 12:13:06 Seq 2 Litlen: 2 mo: 1 (abs) ml: 33 . bits remain: 15
2022/03/09 12:13:06 error after: 15 extra bits on block, should be 0

Desktop (please complete the following information):

Edit: Second example, 0 sequences, but 4 bytes left on stream after literal decoding: 447902c8e4faf807f9b8f9c1861abe7990c476dd.zst.gz

terrelln commented 2 years ago

Thanks for the bug report!

We'll aim to fix this one, but we're okay with not reporting all types of corruption. Our policy is that if you want error detection, enable checksumming. That said, we still like to report corruption whenever we can.

The differential fuzzing you're doing is super interesting! Glad to see that there is another robust enough implementation out there that differential fuzzing can reveal bugs in the reference decoder!

klauspost commented 2 years ago

One thing I've learnt from the decoder is that it checks everything :)

This should be a good (but not reliable) corruption detection method. Of course not as reliable as CRC, but it can also be seen as a good supplement for the chance of the 32 bit CRC colliding, just like the other checks.

Yeah. It is not a clean-room implementation, but still based on the spec and with peeks at the code for the "not-clear-enough-for-me" parts. Most of the remaining mismatches are because of how the wrapper is implemented.