Closed jsievers closed 7 years ago
Hi,
Is there something that does not work when you use the produced archives or it's only the warning you get from zip -F
? The warning you get is because the Version Needed To Extract
field in the local header does not match the one in the central directory. The CRC checksum is OK (one of your comments suggests that the CRC checksum does not match the one in CD but in fact it does).
Nevertheless this is a strange behavior and I'll take a look at it. And there is another difference between the archives created by ConcurrentJarCreatorTest
and ZipArchiverTest
. The file created by ZipArchiverTest
contains data descriptor records despite that a) the data descriptor flag is not set and b) the local header's CRC sum and entry length are not set to zero. The file created by ConcurrentJarCreatorTest
does not have data descriptor records at all. I think those records are not needed.
I found where the problem is.
Short version
The central directory Version Needed To Extract
[1] field is set to 2.0
and the local file header field is set to 1.0
. In my opinion it is completely safe to ignore the warning. Still this is a bug in commons-compress and I'll report it.
Long version
I was wrong about the data descriptor flag - it is set correctly, but the rest stays. The difference between the archives created by ConcurrentJarCreatorTest
and AbstractZipArchiver
is that the former constructs the ZipArchiveOutputStream
using File
instance and the later uses BufferedOutputStream
. In the first case ZipArchiveOutputStream
has random access to the archive file(as long as the file supports such access) and in the second case it does not. Why the random access matters? Well, the CRC checksum, compressed size and uncompressed size fields are part of the local file header. And the local file header is placed before the compressed data. So if you don't have random access you can set the CRC checksum, compressed size and uncompressed size fields to zero, write the compressed data and then add data descriptor record[2] with the correct values. Unfortunately due to a bug in commons-compress the values for Version Needed To Extract
in the local file header and in the central directory does not match when data descriptor records are added.
It is my opinion that the data descriptor records in our case are not needed. The individual entries are not compressed by the BufferedOutputStream
. They are compressed separatelly in parallel and then just added using BufferedOutputStream#addRawArchiveEntry
. So BufferedOutputStream
knows the CRC checksum, compressed size and uncompressed size values as they are already calculated. Those values can be set directly in the local file header and that is exactly whats happens. That is why the resulting archive have the correct values in the local file header and not zeros. But if the header contains the correct values then the data descriptor record are redundant. And commons-compress could be improved in this regard so it does not add data descriptor record when they are not needed.
[1] The minimum supported ZIP specification version needed to extract the file (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) [2] The data descriptor record contains the CRC checksum, compressed size and uncompressed size
the issue was discovered because a firewall (most probably application layer firewall with advanced introspection) rejected the download of a zip file, see https://bugs.eclipse.org/bugs/show_bug.cgi?id=516424
I can confirm that after updating commons-compress to latest 1.15-SNAPSHOT and executing steps to reproduce in my first comment above, I no longer get any warnings when running
zip -F --out fixed.zip archive1.zip
this is a followup on Tycho bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=516432
ZipArchiver creates zips about which Info-ZIP complains with warnings for every file entry:
copying: one.txt zip warning: Local Version Needed To Extract does not match CD: one.txt
Steps to reproduce:
(or any other zip archive created by ZipArchiverTest)
Interestingly, if you uncomment the @Ignore s in ConcurrentJarCreatorTest (and do the manual setup by copying some sample files to ${user.home}/lsrc/plexus/
the zip files created by this test (${user.home}/multiStream2-parallel.zip ) are validated just fine by Info-ZIP
That means ConcurrentJarCreator is doing it right while ZipArchiver is producing the warnings about the central directory entries.
We narrowed this bug down to be introduced between plexus-archiver 2.9.1 and 2.10-beta-1
We think it was introduced with commit https://github.com/codehaus-plexus/plexus-archiver/commit/f1cb37ad00157e3fb02a8825680d7a2c35a64bc3
which first introduced concurrent zip/jar file creation.
I am using info-zip on MacOS Sierra but the problem was reported originally on Linux.
$ zip --version Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. This is Zip 3.0 (July 5th 2008), by Info-ZIP. Currently maintained by E. Gordon. Please send bug reports to the authors using the web page at www.info-zip.org; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip, as of above date; see http://www.info-zip.org/ for other sites.
Compiled with gcc 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34) for Unix (Mac OS X) on Oct 14 2016.
Zip special compilation options: USE_EF_UT_TIME (store Universal Time) SYMLINK_SUPPORT (symbolic links supported) LARGE_FILE_SUPPORT (can read and write large files on file system) ZIP64_SUPPORT (use Zip64 to store large files in archives) STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field) UIDGID_16BIT (old Unix 16-bit UID/GID extra field also used) [encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3)