Majored / rs-async-zip

An asynchronous ZIP archive reading/writing crate.
MIT License
131 stars 44 forks source link

`deflate decompression error` after around 20% of the file is decompressed #67

Closed theRookieCoder closed 1 year ago

theRookieCoder commented 1 year ago

I'm currently migrating from zip to async_zip because my program uses tokio IO everywhere, except for in zip files. I'm using the GitHub main branch, not crates.io because of #64. Here is the function I wrote for extracting a given reader to an output directory:

/// Extract the `input` zip file to `output_dir`
pub async fn extract_zip(
    input: impl AsyncRead + AsyncSeek + Unpin,
    output_dir: &Path,
) -> Result<()> {
    let mut zip = ZipFileReader::new(input).await?;
    for i in dbg!(0..zip.file().entries().len()) {
        dbg!(i);
        let entry = zip.file().entries()[i].entry();
        let path = output_dir.join(entry.filename());

        if entry.dir() {
            create_dir_all(&path).await?;
        } else {
            if let Some(up_dir) = path.parent() {
                if !up_dir.exists() {
                    create_dir_all(up_dir).await?;
                }
            }
            copy(&mut zip.entry(i).await?, &mut File::create(&path).await?).await?;
        }
    }
    Ok(())
}

It's almost identical to your example extractor, minus the sanitation because I trust the download sources (for now at least).

I was getting deflate decompression errors at seemingly random places. So I tried debugging it by printing out the indices and the total length (as shown in the code) and I came to a weird conclusion. It seems decompression fails at around 15%-20% of the total length. I have no idea what's going on, and thanks in advance for any help.

Majored commented 1 year ago

Just pushed https://github.com/Majored/rs-async-zip/commit/28a932f7ef0c29e682419bad9ab50714150c1fee which I think fixes this problem.

We were using the uncompressed size rather than the compressed size when reading. Because all the compression algorithms used are self-terminating, this didn't manifest itself. The upstream error will only ever occur when the compressed size is larger than the uncompressed size, which is what I think was happening here.

If you could pull and test again, that would be helpful.

theRookieCoder commented 1 year ago

Yup it works now! Thanks for the quick fix

theRookieCoder commented 1 year ago

I also had an issue with writing zip files, which seems to have been fixed by this too!