Closed Eric-Kobayashi closed 10 months ago
Today had the same issue, got an broken zip archive, and at least async_zip is unable to unarchive this archive. Good thing is I could unarchive it within 7z and restore it.
Hi, i had a similar issue and tracked it down to missing Zip64 EOCDR and EOCDL at the end of the file (The actual headers for data sections are correct).
The problem is that ZipFileWriter
is not automatically setting is_zip64
when the cumulative file size is above 4GB.
As a workaround you can use: force_zip64 like so writer = writer.force_zip64();
and the archive should work just fine.
I've opened a PR (#106) to fix this issue.
Hi @St4NNi thanks a lot for your suggestion.
I've tried the writer = writer.force_zip64();
workaround but it results in a zip file that has some files with header error, when trying to unzip it:
Archives with Errors: 1
Sub items Errors: 1613
These 1613 files were created but have no content as a result. This did not happen before implementing the workaround.
I've tried to pull your branch from this commit implemented in the PR https://github.com/Majored/rs-async-zip/pull/106 https://github.com/St4NNi/rs-async-zip/commit/1466a6c97c4c9407aa00e7247dba6f61cef0ad7d with the same outcome as the above workaround.
### Some additional information that may be helpful:
- My test folder is 7.6 GB and after zipping it reduces to 4.6 GB. It contains 18000+ files.
- I've also tried to zip the folder with Mac system utility and observe the below when running `7z l <zip file>` with 7zip 22.01:
zip file generated by Mac system utility:
7-Zip (z) 22.01 (arm64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 64-bit arm_v:8 locale=C.UTF-8 Threads:6
Scanning the drive for archives: 1 file, 4652012039 bytes (4437 MiB)
Listing archive: mac-test.zip
-- Path = mac-test.zip Type = zip Physical Size = 4652012039 64-bit = + Characteristics = Zip64
zip file generated by async_zip after the workaround (or the fix in #106 )
7-Zip (z) 22.01 (arm64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 64-bit arm_v:8 locale=C.UTF-8 Threads:6
Scanning the drive for archives: 1 file, 4665989723 bytes (4450 MiB)
Listing archive: mac_async_zip_zip64_fix.zip
-- Path = mac_async_zip_zip64_fix.zip Type = zip Physical Size = 4665989723 64-bit = + Characteristics = Zip64 Unsorted_CD
#### Observation:
- Compared to the correctly zipped file, there is an additional `Unsorted_CD` flag in the async_zip generated zip.
- Compared to the zip file before the workaround or the fix in #106, the `ERRORS: Headers Error` no longer occurs and the `64-bit = +` is displayed
Hey, yes unfortunately there seems to be something else broken.
I could reproduce this on my side as well. Maybe to just give everyone else also some insights: The Unsorted_CD
notification is unrelated, 7z just expects the records to be sorted alphabetically, so you just can .sort()
your Vec<PathBuf>
and this should be gone.
0x00
bytes one that is exactly 0xFFFFFF
/ u32::MAX
long -> truncate -s 4294967295 test
and one that is any size above (e.g. 10 Bytes) truncate -s 10 test2
Zip both files with Compression::Stored (to make sure that we can more easily inspect the headers and have a more reproducible file size)
let mut file = File::create("test.zip").await?;
let mut writer = ZipFileWriter::with_tokio(&mut file);
writer = writer.force_zip64();
for file in ["test", "test2"] {
let builder = ZipEntryBuilder::new(file.into(), Compression::Stored);
let mut data = Vec::new();
File::open(file).await?.read_to_end(&mut data).await?;
writer.write_entry_whole(builder, &data).await?;
}
writer.close().await?;
7z t test.zip
will result in ERROR: Headers Error : test2
(The first file is fine)Sadly I may have not the time to investigate this further, but from my understanding this is broken because the lh_offset
is beyond the u32
bounds. I tried to clamp this to NON_ZIP64_MAX_SIZE
and add this information to relative_header_offset
in Zip64ExtendedInformationExtraField
but with no luck. I will give an update if I have figured something out.
Actually I think I've figured it out. Thanks to the above minimal working example and hexdump
comparisons with 7z
output as reference.
Zip64 is needed in one of the following cases:
The previous implementation checked only if A was the case and completely ignored B. The specification says that if the compressed and uncompressed sizes are below 4 Gib the regular fields in the file headers can (and should) be used [ref].
The actual problem was as expected the lh_offset
in the central directory file header. This needed to be upper bounded to NON_ZIP64_MAX_SIZE
but additionally a Zip64 extended extra field with relative_header_offset
was needed (to specify the u64 offset for the actual beginning of the entry).
This is only required in the Central directory and does not need to be added in the local file header that comes in front of the raw data, "valid" fields from the regular format should be skipped completely (in order) in the Zip64 extra fields. Since both compressed and uncompressed sizes are valid no Zip64 extra field is necessary in the local file header. The extra field is only required in the CD file header.
https://github.com/Majored/rs-async-zip/pull/106/commits/6118ad7b613395c56f735e33a605dc54f12abd7e / https://github.com/Majored/rs-async-zip/pull/106/commits/3e080b736c7ca843bc7a0bbad5d7126df1d828d6 fix this issue by creating a Zip64ExtendedInformationExtraFieldBuilder
that keeps track of the added options and creates a valid ZIP64 extra entry via its build
method.
If only case B occurs, the extra information is purely added to the central directory file header. If A is the case (with or without B) the local file header is extended by a Zip64 extra entry. The builder also keeps track of the actual expected data_size
and removes the need for the hard-coded 16 bytes
(because a pure relative_header_offset
is only 8 bytes and relative offset + sizes is 24 bytes (8 + 16) in addition to the 16 bytes for (uncompressed + compressed) sizes only.
Caveat: This is more or less just a PoC additional tests especially for the entry_stream
are needed to verify that Zip64 handling now behaves correctly.
Can confirm the commit https://github.com/Majored/rs-async-zip/commit/1466a6c97c4c9407aa00e7247dba6f61cef0ad7d fixes the issue during my test:
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,6 CPUs LE)
Scanning the drive for archives: 1 file, 4666009079 bytes (4450 MiB)
Path = mac_async_zip_zip64_fix.zip Type = zip Physical Size = 4666009079 64-bit = +
Everything is Ok
Files: 18153 Size: 7659607926 Compressed: 4666009079
7-Zip (z) 22.01 (arm64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 64-bit arm_v:8 locale=C.UTF-8 Threads:6
Scanning the drive for archives: 1 file, 4666009079 bytes (4450 MiB)
Path = mac_async_zip_zip64_fix.zip Type = zip Physical Size = 4666009079 64-bit = + Characteristics = Zip64
Everything is Ok
Files: 18153 Size: 7659607926 Compressed: 4666009079
- The Mac system utility is also able to unzip the archive
Thanks @St4NNi for providing the fix and hope #106 will be reviewed and merged.
@Eric-Kobayashi Do you think you could retest the current main branch for me in relation to this? Just want to double check none of the conflicts from two later PRs have caused any issues.
@Majored Can confirm the current main branch works as expected.
Summary
The async_zip rust library appears to write an incorrect Zip64 header, which is causing problems when dealing with zip files larger than 4GB.
Expected Behaviour
The async_zip library should generate zip files with correct Zip64 headers, ensuring compatibility with a range of extraction tools, including older versions of 7zip.
Actual Behaviour
The library's incorrect Zip64 header implementation leads to extraction failure when using default version of 7zip (16.02) on ubuntu system.
Reproduction Steps
pub async fn zip_folder( files: Vec,
input_path: &Path,
target: &str,
) -> Result<()>
let algorithm = Compression::Deflate;
let mut writer = ZipFileWriter::new(File::create(Path::new(&target)).await?);
let input_dir_str = input_path.as_os_str().to_str().ok_or(anyhow!("Input path not valid UTF-8."))?;
let file_count = files.len();
for (i, entry) in files.iter().enumerate() { let entry_path = entry.as_path(); let entry_str = entry_path.as_os_str().to_str().ok_or(anyhow!("Directory file path not valid UTF-8."))?; let name = &entry_str[input_dir_str.len() + 1..]; let mut input_file = File::open(entry_path).await?; let mut buffer = Vec::new(); input_file.read_to_end(&mut buffer).await?; let entry_options = EntryOptions::new(name.into(), algorithm).unix_permissions(0o755); writer.write_entry_whole(entry_options, &buffer).await?; } writer.close().await?; Ok(()) }
let source_location = Path::new("test_folder"); let input_pathbuf = source_location.canonicalize().maperr(|| "Unable to canonicalise input path.")?; let files: Vec = ...// get all files in source_location, input files are 4GB+ in total size
let target_location: &str = "test_folder.zip"
zip_folder(files, input_pathbuf.as_path(), &target_location).await?
Scanning the drive for archives: 1 file, 10977219364 bytes (11 GiB)
Extracting archive: test_folder.zip ERROR: test_folder.zip Can not open the file as archive
Can't open as archive: 1 Files: 0 Size: 0 Compressed: 0
Scanning the drive for archives: 1 file, 10977219364 bytes (11 GiB)
Extracting archive: test_folder.zip
ERRORS: Headers Error
7-Zip (z) 22.01 (arm64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
64-bit arm_v:8 locale=C.UTF-8 Threads:6
Scanning the drive for archives: 1 file, 4665988600 bytes (4450 MiB) Extracting archive: test_folder.zip
WARNINGS: 32-bit overflow in headers