Frommi / miniz_oxide

Rust replacement for miniz
MIT License
183 stars 50 forks source link

miniz_oxide::inflate::stream::inflate consumes more bytes than it actually used #158

Open link2xt opened 1 month ago

link2xt commented 1 month ago

Here is a failing test:

#[test]
fn partial_decompression_imap() {
    use miniz_oxide::inflate::stream::{inflate, InflateState};
    use miniz_oxide::{DataFormat, MZFlush};

    // Decompresses to
    // "* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n"
    let input = vec![
        210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
        45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
        135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
        26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
        231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
        226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
    ];

    let mut inflate_stream = InflateState::new(DataFormat::Raw);
    let mut output = vec![0; 8];
    let result = inflate(&mut inflate_stream, &input, &mut output, MZFlush::None);

    assert!(result.status.is_ok());
    // Should not consume everything, there is not enough space in the buffer for the output.
    assert_ne!(result.bytes_consumed, input.len())
}

Output buffer is only 8 bytes in size, but miniz_oxide consumes the whole input buffer anyway. This causes the issue in flate2 that does not happen when using another deflate implementations such as zlib, so I think it is a miniz_oxide bug: https://github.com/rust-lang/flate2-rs/issues/434

link2xt commented 1 month ago

This underlying decompress function is probably fine, it takes out and should make sure not to overflow it: https://github.com/Frommi/miniz_oxide/blob/0a33effd414711b379e01b0613ba5ae85a0e14d0/miniz_oxide/src/inflate/core.rs#L1172-L1178

But inflate_loop here passes state.dict into decompress as the output buffer: https://github.com/Frommi/miniz_oxide/blob/0a33effd414711b379e01b0613ba5ae85a0e14d0/miniz_oxide/src/inflate/stream.rs#L308

And this output buffer size is 32768 (TINFL_FZ_DICT_SIZE).

link2xt commented 1 month ago

Maybe there is no bug, I made decompression work with flate2 using miniz_oxide backend: https://github.com/rust-lang/flate2-rs/pull/436 Seems the input just got consumed into InflateState and on the next call it gets out. This does not happen with zlib, but still usable.

link2xt commented 1 month ago

@oyvindln I made async-compression (which in turn uses flate2) to work: https://github.com/Nullus157/async-compression/pull/294

But this requires trying to inflate even when we have no input anymore to check if we can extract something from the internal state of the decoder. If it is possible to change miniz_oxide to behave like zlib and not consume more input than actually used that would be great as it will make miniz_oxide more difficult to misuse. Otherwise I think this "footgun" should at least be documented in miniz_oxide and flate2 documentation, current flate2 documentation is misleading and this led to bug in async-compression.

phord commented 1 month ago

I believe it is possible to have all of your input "consumed" yet still have remaining bytes to retrieve via zlib, too. It may be only a byte or two. But if you're not checking the state variables for "done". you may still be leaving pieces behind.

oyvindln commented 1 month ago

Hm, would have to investigate that as well then

oyvindln commented 1 month ago

Seeing as this seems to be how the original miniz behaves and is designed around I think it makes sense to keep this as the default behaviour (though document it properly.

I think it would be doable to make a separate variant of the function by using a non-circular internal buffer that would limit the buffer in the inner call to decompress to the same size as the output buffer so it behaves similarly to zlib though. It's possible it would have slightly different performance characteristics but I can't tell without implementing it.