dyz1990 / sevenz-rust

A 7z decompressor/compressor lib written in pure rust
Apache License 2.0
146 stars 24 forks source link

ChecksumVerificationFailed on read of many files in solid archive #31

Open Revertron opened 11 months ago

Revertron commented 11 months ago

I have solid archives with block size of 16Mb. And many of the files fail to read because of ChecksumVerificationFailed.

Example archive: https://up.revertron.com/Memes.7z

Example code:

pub fn test_blocks() {
    let mut buf = Vec::new();

    let mut archive = SevenZReader::open("Memes.7z", Password::empty()).expect("Error opening 7z archive");
    let _ = archive.for_each_entries(|entry, reader| {
        println!("Reading file {}", &entry.name);
        if "FcGD7nuX0AgQNS_.jpg" == entry.name {
            println!("*** Found file {}", &entry.name);
            match reader.read_to_end(&mut buf) {
                Ok(_size) => {
                    println!("Have read file {}", &entry.name);
                    return Ok(false);
                }
                Err(e) => {
                    println!("Error reading file {}: {}", &entry.name, &e);
                    return Err(sevenz_rust::Error::from(e));
                }
            }
        }
        Ok(true)
    });
    assert!(!buf.is_empty())
}
dyz1990 commented 11 months ago

You can't skip reading these entries, even if you don't need them. Try this code:


pub fn test_blocks() {
    let mut buf = Vec::new();

    let mut archive =
        SevenZReader::open("Memes.7z", Password::empty()).expect("Error opening 7z archive");
    let _ = archive.for_each_entries(|entry, reader| {
        println!("Reading file {}", &entry.name);
        if "FcGD7nuX0AgQNS_.jpg" == entry.name {
            println!("*** Found file {}", &entry.name);
            match reader.read_to_end(&mut buf) {
                Ok(_size) => {
                    println!("Have read file {}", &entry.name);
                    return Ok(false);
                }
                Err(e) => {
                    println!("Error reading file {}: {}", &entry.name, &e);
                    return Err(sevenz_rust::Error::from(e));
                }
            }
        } else {
            // comsume the reader to skip the file, even if we don't need it
            while let Ok(n) = reader.read(&mut [0; 4096]) {
                if n == 0 {
                    break;
                }
            }
            Ok(true)
        }
    });
    assert!(!buf.is_empty())
}
Revertron commented 11 months ago

Thanks for quick response! This works, but it is very slow, even if I make buffer 2Mb and move it from closure and reuse it.

Is there something to make it faster? :(

Revertron commented 11 months ago

Gone through the code of reader, and I think we need to change all those R: Read to Read + Seek, and then just skip unread bytes. But there is a problem with multiple traits: https://doc.rust-lang.org/error_codes/E0225.html So, we need to create a different trait like this: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=66c772a420cb50c0fa78ab3d91bda052

dyz1990 commented 11 months ago

@Revertron Because the data to be decompressed depends on the data in front of it, you cannot simply skip the previous data and only decompress the data in the back. This is why the reader does not implement the Seek trait.

Revertron commented 11 months ago

But the 7zip app is definitely skipping all blocks before the block of extracting file. Is it possible to implement this?

dyz1990 commented 11 months ago

But the 7zip app is definitely skipping all blocks before the block of extracting file. Is it possible to implement this?

It's not easy, I'll give it a try

dyz1990 commented 11 months ago

@Revertron I noticed that the file "Memes.7z" contains more than one solid stream. So you can speed up decompression by skipping streams that don't contain required files.

you can check this example forder_dec.rs. And this example mt_decompress.rs if you want use multi-thread.

pavpen commented 8 months ago

I think you should, at least, document this issue in the description of for_each_entries, and related functions. I spent a day debugging my code to end up here.

dyz1990 commented 7 months ago

@pavpen Sorry about that. I'll add documentation for the method.