gendx / lzma-rs

An LZMA decoder written in pure Rust
MIT License
127 stars 27 forks source link

lzma2_compress and zx_compress do not compress input #99

Open holly-hacker opened 1 year ago

holly-hacker commented 1 year ago

I was comparing compression methods to figure out what to choose for my project and noticed some strange results:

postcard: 14880

postcard deflate: 3499
postcard brotli: 2792
postcard lz4: 4821
postcard lzma: 8496
postcard lzma2: 14884
postcard xz: 14932

In this snippet, lzma, lzma2 and xz are done using lzma_compress, lzma2_compress and xz_compress respectively. It seemed weird to me that lzma2 and xz are so poor, so I stepped through the code and noticed that it's not applying compression at all for lzma2 and xz:

/// Compress data with LZMA2 and default [`Options`](compress/struct.Options.html).
pub fn lzma2_compress<R: io::BufRead, W: io::Write>(
    input: &mut R,
    output: &mut W,
) -> io::Result<()> {
    encode::lzma2::encode_stream(input, output)
}

// ...

pub fn encode_stream<R, W>(input: &mut R, output: &mut W) -> io::Result<()>
where
    R: io::BufRead,
    W: io::Write,
{
    let mut buf = vec![0u8; 0x10000];
    loop {
        let n = input.read(&mut buf)?;
        if n == 0 {
            // status = EOF
            output.write_u8(0)?;
            break;
        }

        // status = uncompressed reset dict
        output.write_u8(1)?;
        // unpacked size
        output.write_u16::<BigEndian>((n - 1) as u16)?;
        // contents
        output.write_all(&buf[..n])?;
    }
    Ok(())
}

This sounds like either a bug, or something that should be documented better. As I understand from the readme file, lzmr-rs is only a decoder which is confusing since these compression functions are present.

gendx commented 1 year ago

See #9 and the "Encoder" section in the README:

For now, there is also a dumb encoder that only uses byte literals, with many hard-coded constants for code simplicity. Better encoders are welcome!

I can remove mention of "encoder" from the "About" description of this GitHub repo. Don't hesitate to send a PR with further suggestions on how to improve the wording in the README.

holly-hacker commented 5 months ago

I was reminded of this issue by your recent blogpost. Should I close this in favor of #9, or do you want to keep it open?