Majored / rs-async-zip

An asynchronous ZIP archive reading/writing crate.
MIT License
123 stars 40 forks source link

a computed CRC32 value did not match the expected value - when CRC is in data_descriptor #141

Open nipunn1313 opened 1 week ago

nipunn1313 commented 1 week ago

Related to #122 but slightly different

I'm working with v0.0.17

Here's a zipfile that reproes external_modules.zip

The zipfile was generated with https://www.npmjs.com/package/archiver - and seems to unzip fine with other tools.

Using the zipdetails tool, it shows this for the first couple entries.

00000 LOCAL HEADER #1       04034B50
00004 Extract Zip Spec      14 '2.0'
00005 Extract OS            00 'MS-DOS'
00006 General Purpose Flag  0008
      [Bits 1-2]            0 'Normal Compression'
      [Bit  3]              1 'Streamed'
00008 Compression Method    0008 'Deflated'
0000A Last Mod Time         58DB8572 'Thu Jun 27 16:43:36 2024'
0000E CRC                   00000000
00012 Compressed Length     00000000
00016 Uncompressed Length   00000000
0001A Filename Length       001F
0001C Extra Length          0000
0001E Filename              'node_modules/.package-lock.json'
0003D PAYLOAD

008D2 STREAMING DATA HEADER 08074B50
008D6 CRC                   528AD14A
008DA Compressed Length     00000895
008DE Uncompressed Length   0000105F

It seems that the data_descriptor includes the CRC. I did a bit of digging in my repro and found that async-zip is using the 00000000 CRC for comparison rather than the one in the data_descriptor.

https://github.com/Majored/rs-async-zip/blob/main/SPECIFICATION.md#439 Spec seems to indicate that if header.flags.data_descriptor is set, then crc/lengths should be taken from data descriptor rather than local file header.

nipunn1313 commented 1 week ago

Like @ldanilek mentioned in #122 - Things parsed in 0.0.9 and now no longer work.

there is a workaround which is to manually bypass crc check.

entry_reader.reader_mut().read_to_end(&mut contents).await?;

instead of

            entry_reader
                .reader_mut()
                .read_to_end_checked(&mut contents)
                .await?;

Don't love bypassing the CRC check though for obvious reasons.

nipunn1313 commented 1 week ago

It looks like 0.0.9 had code like this

    /// Returns true if the computed CRC32 value of all bytes read so far matches the expected value.
    pub fn compare_crc(&mut self) -> bool {
        let hasher = std::mem::take(&mut self.hasher);
        let final_crc = hasher.finalize();

        if self.meta.general_purpose_flag.data_descriptor {
            self.data_descriptor.expect("Data descriptor was not read").0 == final_crc
        } else {
            self.entry.crc32() == final_crc
        }
    }

which on 0.0.17 no longer seems to work.

https://github.com/Majored/rs-async-zip/blob/main/src/base/read/stream.rs#L177

It seems that the CRC is not read out of the data descriptor.

Then, I think you'd want the CRC check to happen in done() - after parsing the data descriptor - rather than in read_to_end_checked, at least for these data descriptor cases.