Majored / rs-async-zip

An asynchronous ZIP archive reading/writing crate.
MIT License
131 stars 44 forks source link

Small fix for missing Zip64 EOCDR / EOCDL if cumulative size > 4Gib and all entrys are < 4Gib #106

Closed St4NNi closed 1 year ago

St4NNi commented 1 year ago

This is a very small PR that fixes https://github.com/Majored/rs-async-zip/issues/105 .

Explanation:

If the cumulative file size of all zip entrys exceeds the 4 Gib maximum for standard Zip a Zip64 EOCDR & EOCDL is needed.

This was not set automatically if no single entry exceeded the limit, the PR fixes this by checking the force_no_zip64 flag and updating the is_zip64 boolean if its allowed, otherwise the LargeFile error is raised:

if cd_offset > NON_ZIP64_MAX_SIZE as u64 {
    if self.force_no_zip64 {
        return Err(crate::error::ZipError::Zip64Needed(crate::error::Zip64ErrorCase::LargeFile));
    } else {
        self.is_zip64 = true;
    }
}

Afterwards the correct EOCDR & EOCDL for Zip64 will be added.

Additional Problem

105 revealed an additional problem with Zip64 handling:

Zip64 is needed in one of the following cases:

The previous implementation checked only if A was the case and completely ignored B. The specification says that if the compressed and uncompressed sizes are below 4 Gib the regular fields in the file headers can (and should) be used [ref].

The actual problem was as expected the lh_offset in the central directory file header. This needed to be upper bounded to NON_ZIP64_MAX_SIZE but additionally a Zip64 extended extra field with relative_header_offset was needed (to specify the u64 offset for the actual beginning of the entry).

This is only required in the Central directory and does not need to be added in the local file header that comes in front of the raw data, "valid" fields from the regular format should be skipped completely (in order) in the Zip64 extra fields. Since both compressed and uncompressed sizes are valid no Zip64 extra field is necessary in the local file header. The extra field is only required in the CD file header.

Fix

https://github.com/Majored/rs-async-zip/pull/106/commits/6118ad7b613395c56f735e33a605dc54f12abd7e / https://github.com/Majored/rs-async-zip/pull/106/commits/3e080b736c7ca843bc7a0bbad5d7126df1d828d6 fix this issue by creating a Zip64ExtendedInformationExtraFieldBuilder that keeps track of the added options and creates a valid ZIP64 extra entry via its build method.

If only case B occurs, the extra information is purely added to the central directory file header. If A is the case (with or without B) the local file header is extended by a Zip64 extra entry. The builder also keeps track of the actual expected data_size and removes the need for the hard-coded 16 bytes (because a pure relative_header_offset is only 8 bytes and relative offset + sizes is 24 bytes (8 + 16) in addition to the 16 bytes for (uncompressed + compressed) sizes only.

Caveat: This is more or less just a PoC additional tests especially for the entry_stream are needed to verify that Zip64 handling now behaves correctly.

Majored commented 1 year ago

Apologies for the delay here, and thank you for the amount of work put into debugging this/presenting a fix.

Merging. 👍