BurntSushi / rust-snappy

Snappy compression implemented in Rust (including the Snappy frame format).
BSD 3-Clause "New" or "Revised" License
444 stars 43 forks source link

decompress_len output size, unexpected value #40

Closed milesgranger closed 3 years ago

milesgranger commented 3 years ago

Happy Friday!

My question is regarding decompress_len Perhaps I'm using it wrong, but it gives an output significantly less than what the actual decompressed length is.

Given this:

#[cfg(test)]
mod tests {

    use std::io::Write;
    use snap::raw::decompress_len;

    #[test]
    fn test_decompress_len() {
        let data = (0..5000000)
            .map(|_| b"oh what a beautiful morning, oh what a beautiful day!!".to_vec())
            .flat_map(|v|v)
            .collect::<Vec<u8>>();

        let compressed = {
            let buffer = Vec::new();
            let mut encoder = snap::write::FrameEncoder::new(buffer);
            encoder.write_all(&data).unwrap();
            encoder.get_ref().to_vec()
        };
        let estimated_decompression_len = decompress_len(&compressed).unwrap();
        println!("Estimated size: {}, actual: {}", estimated_decompression_len, data.len());
        assert!(estimated_decompression_len >= data.len());
    }
}

estimated_decompress_len ends up as 895 but actual is 270000000

I must be doing something wrong, appreciate any pointers. :-)

BurntSushi commented 3 years ago

Think about what you're asking for: the total decompressed size of some set of bytes that was compressed as a stream. This is fundamentally impossible without consuming the entire stream.

More precisely, you're using a method from the raw module, which specifically only works on the "raw" snappy format, on bytes encoded via the the "frame" snappy format. You might just be lucky that decompress_len doesn't return an error in this particular case.