codehaus-plexus / plexus-archiver

https://codehaus-plexus.github.io/plexus-archiver/
Apache License 2.0
44 stars 48 forks source link

ZipArchiver creates archives with inconsistent central directory entries #57

Closed jsievers closed 7 years ago

jsievers commented 7 years ago

this is a followup on Tycho bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=516432

ZipArchiver creates zips about which Info-ZIP complains with warnings for every file entry:

copying: one.txt zip warning: Local Version Needed To Extract does not match CD: one.txt

Steps to reproduce:

  1. run the plexus-archiver unit tests.
  2. go to plexus-archiver/target/output
  3. zip -F --out fixed.zip archive1.zip

(or any other zip archive created by ZipArchiverTest)

Interestingly, if you uncomment the @Ignore s in ConcurrentJarCreatorTest (and do the manual setup by copying some sample files to ${user.home}/lsrc/plexus/

the zip files created by this test (${user.home}/multiStream2-parallel.zip ) are validated just fine by Info-ZIP

That means ConcurrentJarCreator is doing it right while ZipArchiver is producing the warnings about the central directory entries.

We narrowed this bug down to be introduced between plexus-archiver 2.9.1 and 2.10-beta-1

We think it was introduced with commit https://github.com/codehaus-plexus/plexus-archiver/commit/f1cb37ad00157e3fb02a8825680d7a2c35a64bc3

which first introduced concurrent zip/jar file creation.

I am using info-zip on MacOS Sierra but the problem was reported originally on Linux.

$ zip --version Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. This is Zip 3.0 (July 5th 2008), by Info-ZIP. Currently maintained by E. Gordon. Please send bug reports to the authors using the web page at www.info-zip.org; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip, as of above date; see http://www.info-zip.org/ for other sites.

Compiled with gcc 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34) for Unix (Mac OS X) on Oct 14 2016.

Zip special compilation options: USE_EF_UT_TIME (store Universal Time) SYMLINK_SUPPORT (symbolic links supported) LARGE_FILE_SUPPORT (can read and write large files on file system) ZIP64_SUPPORT (use Zip64 to store large files in archives) STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field) UIDGID_16BIT (old Unix 16-bit UID/GID extra field also used) [encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3)

plamentotev commented 7 years ago

Hi,

Is there something that does not work when you use the produced archives or it's only the warning you get from zip -F ? The warning you get is because the Version Needed To Extract field in the local header does not match the one in the central directory. The CRC checksum is OK (one of your comments suggests that the CRC checksum does not match the one in CD but in fact it does).

Nevertheless this is a strange behavior and I'll take a look at it. And there is another difference between the archives created by ConcurrentJarCreatorTest and ZipArchiverTest. The file created by ZipArchiverTest contains data descriptor records despite that a) the data descriptor flag is not set and b) the local header's CRC sum and entry length are not set to zero. The file created by ConcurrentJarCreatorTest does not have data descriptor records at all. I think those records are not needed.

plamentotev commented 7 years ago

I found where the problem is.

Short version

The central directory Version Needed To Extract[1] field is set to 2.0 and the local file header field is set to 1.0. In my opinion it is completely safe to ignore the warning. Still this is a bug in commons-compress and I'll report it.

Long version I was wrong about the data descriptor flag - it is set correctly, but the rest stays. The difference between the archives created by ConcurrentJarCreatorTest and AbstractZipArchiver is that the former constructs the ZipArchiveOutputStream using File instance and the later uses BufferedOutputStream. In the first case ZipArchiveOutputStream has random access to the archive file(as long as the file supports such access) and in the second case it does not. Why the random access matters? Well, the CRC checksum, compressed size and uncompressed size fields are part of the local file header. And the local file header is placed before the compressed data. So if you don't have random access you can set the CRC checksum, compressed size and uncompressed size fields to zero, write the compressed data and then add data descriptor record[2] with the correct values. Unfortunately due to a bug in commons-compress the values for Version Needed To Extract in the local file header and in the central directory does not match when data descriptor records are added.

It is my opinion that the data descriptor records in our case are not needed. The individual entries are not compressed by the BufferedOutputStream. They are compressed separatelly in parallel and then just added using BufferedOutputStream#addRawArchiveEntry. So BufferedOutputStream knows the CRC checksum, compressed size and uncompressed size values as they are already calculated. Those values can be set directly in the local file header and that is exactly whats happens. That is why the resulting archive have the correct values in the local file header and not zeros. But if the header contains the correct values then the data descriptor record are redundant. And commons-compress could be improved in this regard so it does not add data descriptor record when they are not needed.

[1] The minimum supported ZIP specification version needed to extract the file (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) [2] The data descriptor record contains the CRC checksum, compressed size and uncompressed size

plamentotev commented 7 years ago

FYI:

https://issues.apache.org/jira/browse/COMPRESS-394 https://issues.apache.org/jira/browse/COMPRESS-395

jsievers commented 7 years ago

the issue was discovered because a firewall (most probably application layer firewall with advanced introspection) rejected the download of a zip file, see https://bugs.eclipse.org/bugs/show_bug.cgi?id=516424

jsievers commented 7 years ago

I can confirm that after updating commons-compress to latest 1.15-SNAPSHOT and executing steps to reproduce in my first comment above, I no longer get any warnings when running

zip -F --out fixed.zip archive1.zip