dyz1990 / sevenz-rust

A 7z decompressor/compressor lib written in pure rust
Apache License 2.0
146 stars 24 forks source link

Something is wrong with bigger archives #21

Closed amacal closed 1 year ago

amacal commented 1 year ago

I try to decompress just few bytes of two different files, one file works, the other one not. Both files work correctly with 7zip.

enwiki-20230501-pages-meta-history23.xml-p50555787p50564553.7z - works enwiki-20230501-pages-meta-history5.xml-p956483p958045.7z - doesn't work

I used the following code to test it:

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let filename = "/wikipedia/enwiki-20230501-pages-meta-history5.xml-p956483p958045.7z";
    let mut stream = SevenZReader::open(filename, Password::empty()).unwrap();

    stream.for_each_entries(|entry, mut reader| {
        let mut buffer = [0; 16];
        println!("Reading entry: {:?}", entry);
        println!("{:?}", reader.read(&mut buffer[..]));
        println!("{:?}", std::str::from_utf8(&buffer));
        Ok(true)
    })?;

    Ok(())
}

output for enwiki-20230501-pages-meta-history5.xml-p956483p958045.7z:

Reading entry: SevenZArchiveEntry { name: "", has_stream: true, is_directory: false, is_anti_item: false, has_creation_date: false, has_last_modified_date: true, has_access_date: false, creation_date: FileTime(0), last_modified_date: FileTime(133279036098162380), access_date: FileTime(0), has_windows_attributes: true, windows_attributes: 0, has_crc: true, crc: 2292542832, compressed_crc: 0, size: 1585055636, compressed_size: 0, content_methods: [] }
Ok(16)
Ok("<mediawiki xmlns")

output for enwiki-20230501-pages-meta-history5.xml-p956483p958045.7z:

Reading entry: SevenZArchiveEntry { name: "", has_stream: true, is_directory: false, is_anti_item: false, has_creation_date: false, has_last_modified_date: true, has_access_date: false, creation_date: FileTime(0), last_modified_date: FileTime(133278559111423540), access_date: FileTime(0), has_windows_attributes: true, windows_attributes: 0, has_crc: true, crc: 401901466, compressed_crc: 0, size: 3575097180, compressed_size: 0, content_methods: [] }
Ok(0)
Ok("\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0")
dyz1990 commented 1 year ago

@amacal Thank you for your report. There was a numerical overflow issue while decoding a large file. I will fix it and release a new version as soon as possible.

dyz1990 commented 1 year ago

@amacal Version 0.2.11 has fixed this issue. You can update and try it out.

amacal commented 1 year ago

Cool! It works even for the biggest file in the dataset: enwiki-20230501-pages-meta-history10.xml-p5096070p5137514.7z

Reading entry: SevenZArchiveEntry { name: "", has_stream: true, is_directory: false, is_anti_item: false, has_creation_date: false, has_last_modified_date: true, has_access_date: false, creation_date: FileTime(0), last_modified_date: FileTime(133278673660200860), access_date: FileTime(0), has_windows_attributes: true, windows_attributes: 0, has_crc: true, crc: 3468506479, compressed_crc: 0, size: 482374799893, compressed_size: 0, content_methods: [] }
Ok(16)
Ok("<mediawiki xmlns")