gdraheim / zziplib

The ZZIPlib provides read access on ZIP-archives and unpacked data. It features an additional simplified API following the standard Posix API for file access
Other
62 stars 50 forks source link

Problems to unzip file / support zip64 locator format #104

Open beckmi opened 3 years ago

beckmi commented 3 years ago

I have a zip archive with only one compressed file in it. Using unzip it is possible to uncompress it. Using zziplib it is not possible because the dirent of this file is as follows:

(gdb) p dirent $4 = {z_magic = "PK\001\002", z_encoder = {version = "-", ostype = ""}, z_extract = {version = "-", ostype = ""}, z_flags = "\b", z_compr = "\b", z_dostime = {time = "\353S", date = "GQ"}, z_crc32 = "|#\301\215", z_csize = "\377\377\377\377", z_usize = "\377\377\377\377", z_namlen = "\034", z_extras = "\034", z_comment = "\000", z_diskstart = "\000", z_filetype = "\000", z_filemode = "\000\000\000", z_offset = "\377\377\377\377"}

It is possible to list the content, but it is not possible to open the compressed file, because z_offset is -1. Attached the file in question. It is from the sftp side of ENTSOE-E. All their compressed files have the same problem.

beckmi commented 3 years ago

2020_10_OutagesPUReasons.zip

the lost attachment ...

gdraheim commented 3 years ago

It seems that PKWARE has changed its published standards.

4.3.9.2 When compressing files, compressed and uncompressed sizes SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF. However ZIP64 format MAY be used regardless of the size of a file. When extracting, if the zip64 extended information extra field is present for the file the compressed and uncompressed sizes will be 8 byte values.

4.4.16 relative offset of local header: (4 bytes) This is the offset from the start of the first disk on which this file appears, to where the local header SHOULD be found. If an archive is in ZIP64 format and the value in this field is 0xFFFFFFFF, the size will be in the corresponding 8 byte zip64 extended information extra field.

4.5.3 -Zip64 Extended Information Extra Field (0x0001): The following is the layout of the zip64 extended information "extra" block. If one of the size or offset fields in the Local or Central directory record is too small to hold the required data, a Zip64 extended information record is created. The order of the fields in the zip64 extended information record is fixed, but the fields MUST only appear if the corresponding Local or Central directory record field is set to 0xFFFF or 0xFFFFFFFF.

Note: all fields stored in Intel low-byte/high-byte order.

 Value                 Size       Description

0x0001 2 bytes Tag for this "extra" block type (ZIP64) Size 2 bytes Size of this "extra" block Original Size 8 bytes Original uncompressed file size Compressed Size 8 bytes Size of compressed data Relative Header Offset 8 bytes Offset of local header record Disk Start Number 4 bytes Number of the disk on which this file starts

So far the zziplib can read a ZIP64 central directory but it does not read a ZIP64 extras block.

The real bug here is the fact that the file you provided does NOT provide a ZIP64 central directory (magic PK\6\6) but only a normal ZIP central directory (magic PK\6\5) so that the use of a ZIP64 extras block is atleast unintended .... as the usage of 0xFFFF as an extension marker was defined for the ZIP64 file format.

It could be implemented however.

gdraheim commented 3 years ago

After a bit more debugging I can see that the ZIP64-trailer is not used but instead there is a ZIP64-locator (PK\6\7). The pkware appnote documentation says that it was introduced in version 6.2 in 2004/2005. Bewildering as it may seem but the zziplib is older (going back to the 1990ies).

I did check if I can implement the functionality but quite some logic needs to be changed here, so it is a real feature request instead of just a bug fix. I am sorry but this will not come around anytime soon.

beckmi commented 3 years ago

Nevertheless – thanks for the work done so far. Is an open spec downloadable?

Have found it. For others who search it: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

gdraheim commented 3 years ago

Yes, that's it - when pkware did create the first zip.exe they were shipping the package with a APPNOTE.TXT file which did describe (parts of) the file format. That name has stuck referring to the standardisation proposal later.

Here's the official archive = https://support.pkware.com/home/pkzip/developer-tools/appnote/application-note-archives