dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

"A local file header is corrupt" error occurs while unpacking the ZIP archive #49580

Closed Albeoris closed 2 years ago

Albeoris commented 3 years ago

Description

In the process of working with a large number of .zip archives from various sources, I ran into a problem when unpacking some of them.

Configuration

Regression?

No, there is a similar problem in .NET 4.7.2

Other information

ZipArchiveEntry.cs :: IsOpenable

                // _compressedSize is (long) 4294967295 => ffffffff
                if (OffsetOfCompressedData + _compressedSize > _archive.ArchiveStream.Length)
                {
                    message = SR.LocalFileHeaderCorrupt;
                    return false;
                }

ZipBlocks.cs :: TryReadBlock

            bool uncompressedSizeInZip64 = uncompressedSizeSmall == ZipHelper.Mask32Bit; // true
            bool compressedSizeInZip64 = compressedSizeSmall == ZipHelper.Mask32Bit; // true
            bool relativeOffsetInZip64 = relativeOffsetOfLocalHeaderSmall == ZipHelper.Mask32Bit; // false
            bool diskNumberStartInZip64 = diskNumberStartSmall == ZipHelper.Mask16Bit; // false

ZipBlocks.cs :: TryGetZip64BlockFromGenericExtraField

                    zip64Block._size = extraField.Size;

                    ushort expectedSize = 0;

                    if (readUncompressedSize) expectedSize += 8; // true
                    if (readCompressedSize) expectedSize += 8; // true
                    if (readLocalHeaderOffset) expectedSize += 8; // false
                    if (readStartDiskNumber) expectedSize += 4;  // false

                    // expectedSize is 16
                    // zip64Block._size is 28
                    if (expectedSize != zip64Block._size)
                        return false;

                    // unreachable code 
                    if (readUncompressedSize) zip64Block._uncompressedSize = reader.ReadInt64();
                    if (readCompressedSize) zip64Block._compressedSize = reader.ReadInt64();

Here is the ZipInfo result for the given archive. The archive is alive and correctly opened by all current archivers.

There is no zipfile comment.

End-of-central-directory record:
-------------------------------

  Zip archive file size:                      7414 (0000000000001CF6h)
  Actual end-cent-dir record offset:          7316 (0000000000001C94h)
  Expected end-cent-dir record offset:        7316 (0000000000001C94h)
  (based on the length of the central directory and its expected offset)

  This zipfile constitutes the sole disk of a single-part archive; its
  central directory contains 1 entry.
  The central directory is 151 (0000000000000097h) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 7165 (0000000000001BFDh).

Central directory entry #1:
---------------------------

  file.txt

  offset of local header from start of archive:   0
                                                  (0000000000000000h) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   4.5
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   4.5
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2021 Mar 13 18:11:52
  32-bit CRC value (hex):                         1b0e1343
  compressed size:                                7042 bytes
  uncompressed size:                              93523 bytes
  length of filename:                             37 characters
  length of extra field:                          68 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (00 hex):                none

  The central-directory extra field contains:
  - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 28 data bytes.  The first
    20 are:   53 6d 01 00 00 00 00 00 82 1b 00 00 00 00 00 00 00 00 00 00.
  - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
    20 are:   00 00 00 00 01 00 18 00 4b 75 22 0f 76 04 d7 01 4b 75 22 0f.

  There is no file comment.

A similar problem was mentioned earlier, but it was related to large files: https://github.com/dotnet/runtime/issues/1094

dotnet-issue-labeler[bot] commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

danmoseley commented 3 years ago

Is it possible to share a zip that repros? Although it may be some time before we can take a look, perhaps you are comfortable debugging further.

Albeoris commented 3 years ago

Here you are: test.zip

maikebing commented 3 years ago

It seems that the problem remains OS:CentOS Linux release 7.6.1810 (Core) .Net 5.0.621.22011

I'm trying to solve this problem using the following method, https://github.com/dotnet/runtime/issues/1094#issuecomment-610260232

Most are normal, but occasionally the following occurs :

Exception has been thrown by the target of an invocation.

The problem could not be reproduced locally windows os !

I'll try to get further information,

danmoseley commented 3 years ago

Thanks for the info. @maikebing it might be interesting if you could break under a debugger and see what return code we're getting from where. I assume you are using x64?

To set expectations, it might be a while before we look at this on our side. Breaking in in a debugger at the point the exception is thrown might suggest whether it's .NET code or zlib. We are using the latest zlib (https://github.com/madler/zlib/releases/tag/v1.2.11) -- they haven't updated for a couple years. If there is some other tool (such as the platform 'unzip' command perhaps) that is zlib based it would be interesting to know whether that repros the problem.

0xced commented 3 years ago

And here are reproduction steps.

Issue49580.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net48;net5.0</TargetFrameworks>
    <LangVersion>9.0</LangVersion>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="System.IO.Compression" Version="4.3.0" />
  </ItemGroup>

</Project>

Program.cs

using System;
using System.IO;
using System.IO.Compression;
using System.Runtime.InteropServices;

// Demonstrates issue described on https://github.com/dotnet/runtime/issues/49580

try
{
    // Produced with `xxd -i test.zip` with file from https://github.com/dotnet/runtime/files/6135119/test.zip
    var zipData = new byte[]
    {
        0x50, 0x4b, 0x03, 0x04, 0x2d, 0x00, 0x00, 0x08, 0x08, 0x00, 0x17, 0x9b, 0x6d, 0x52, 0x0c, 0x7e, 0x7f, 0xd8, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
        0xff, 0xff, 0x08, 0x00, 0x38, 0x00, 0x66, 0x69, 0x6c, 0x65, 0x2e, 0x74, 0x78, 0x74, 0x01, 0x00, 0x10, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x18, 0x00, 0xa8, 0xb1,
        0xf6, 0x61, 0x25, 0x18, 0xd7, 0x01, 0xa8, 0xb1, 0xf6, 0x61, 0x25, 0x18, 0xd7, 0x01, 0xa8, 0xb1, 0xf6, 0x61, 0x25, 0x18, 0xd7, 0x01, 0x2b, 0x49,
        0x2d, 0x2e, 0x01, 0x00, 0x50, 0x4b, 0x01, 0x02, 0x2d, 0x00, 0x2d, 0x00, 0x00, 0x08, 0x08, 0x00, 0x17, 0x9b, 0x6d, 0x52, 0x0c, 0x7e, 0x7f, 0xd8,
        0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x08, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x66, 0x69, 0x6c, 0x65, 0x2e, 0x74, 0x78, 0x74, 0x01, 0x00, 0x1c, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x20, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x01, 0x00, 0x18, 0x00, 0xa8, 0xb1, 0xf6, 0x61, 0x25, 0x18, 0xd7, 0x01, 0xa8, 0xb1, 0xf6, 0x61, 0x25, 0x18, 0xd7, 0x01, 0xa8, 0xb1,
        0xf6, 0x61, 0x25, 0x18, 0xd7, 0x01, 0x50, 0x4b, 0x06, 0x06, 0x2c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x00, 0x2d, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7a, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x50, 0x4b, 0x06, 0x07, 0x00, 0x00, 0x00, 0x00, 0xde, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x50, 0x4b, 0x05, 0x06, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0x7a, 0x00,
        0x00, 0x00, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00
    };

    Console.WriteLine($"Extracting test.zip on {RuntimeInformation.OSDescription.Trim()} ({RuntimeInformation.FrameworkDescription})");

    using var archive = new ZipArchive(new MemoryStream(zipData));
    foreach (var entry in archive.Entries)
    {
        Console.WriteLine($"{entry} (CompressedLength: {entry.CompressedLength} -- Length: {entry.Length})");
        using var _ = entry.Open(); // throws System.IO.InvalidDataException (A local file header is corrupt)
    }
    return 0;
}
catch (Exception exception)
{
    Console.Error.WriteLine(exception);
    return 1;
}

Result on .NET Framework 4.8 (dotnet run -f net48):

Extracting test.zip on Microsoft Windows 10.0.18363 (.NET Framework 4.8.4250.0)
file.txt (CompressedLength: 4294967295 -- Length: 4294967295)
System.IO.InvalidDataException: A local file header is corrupt.
   at System.IO.Compression.ZipArchiveEntry.OpenInReadMode(Boolean checkOpenable)
   at <Program>$.<Main>$(String[] args) in Program.cs:line 35

Result on .NET 5 (dotnet run -f net5.0):

Extracting test.zip on Microsoft Windows 10.0.18363 (.NET 5.0.7)
file.txt (CompressedLength: 4294967295 -- Length: 4294967295)
System.IO.InvalidDataException: A local file header is corrupt.
   at System.IO.Compression.ZipArchiveEntry.OpenInReadMode(Boolean checkOpenable)
   at System.IO.Compression.ZipArchiveEntry.Open()
   at <Program>$.<Main>$(String[] args) in Program.cs:line 35
danmoseley commented 3 years ago
                if (OffsetOfCompressedData + _compressedSize > _archive.ArchiveStream.Length)
                {
                    message = SR.LocalFileHeaderCorrupt;
                    return false;
                }

(long)94 + (long)4294967295 > (long)320.

Note 320 is an int, because it is Length on a Stream. I don't know the code, but I suspect it is not intended to compare longs with ints.

Albeoris commented 3 years ago

(long)94 + (long)4294967295 > (int)320.

4294967295 is 0xFFFFFFFF => -1

Note 320 is an int, because it is Length on a Stream. I don't know the code, but I suspect it is not intended to compare longs with ints. Stream.Length is long

ryanwilliams83 commented 3 years ago

Please fix; this bug also manifests in powershell's Expand-Archive as "Unable to remove file" (or similar) errors.

0xced commented 3 years ago

For the record, here's an example on how this issue can manifest in real life:

vmachacek commented 2 years ago

please fix this

danmoseley commented 2 years ago

OK, apologies for not looking into this earlier, especially given the excellent description and ideal repro above.

0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x08, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00

So as noted, compressed and uncompressed size are both 0xFFFFFFFF. file name is 0x0008, extra field length is 0x0044. 'disk number start' and 'external file attributes' are both 0x0.

the spec says that if they are 0xFFFF / 0xFFFFFFFF respectively, they are in the extra field; it doesn't say that if they aren't those values that they aren't in the extra field:

4.4.13 disk number start: (2 bytes)

  The number of the disk on which this file begins.  If an 
  archive is in ZIP64 format and the value in this field is 
  0xFFFF, the size will be in the corresponding 4 byte zip64 
  extended information extra field.

But the Zip64 extended field section seems completely explicit that they ONLY appear if they were 0xFFFF / 0xFFFFFFFF. Our code follows this, and identifies the unexpected length as a corruption.

4.5.3

 The order of the fields in the zip64 extended 
information record is fixed, but the fields MUST
 only appear if the corresponding Local or Central
 directory record field is set to 0xFFFF or 0xFFFFFFFF.

What do other implementations do?

The old WPF implementation mentioned in the past issue seems to make the same check and SharpCompress apparently does too.

However Python apparently skips the specified length, whether or not it uses the fields: https://github.com/python/cpython/blob/main/Lib/zipfile.py#L514 as does Rust https://github.com/zip-rs/zip/blob/master/src/read.rs#L794 and I think Go is not checking (not a Go speaker) https://cs.opensource.google/go/go/+/master:src/archive/zip/reader.go;l=354;bpv=0;bpt=1

My guess is that Python and the others' implementations are most battle-tested and we should trust the length.

danmoseley commented 2 years ago

For this zip, 7zip shows a warning "Characteristics: Extra_ERROR Zip64_ERROR NTFS : UTF8". From looking at https://sourceforge.net/p/sevenzip/discussion/45797/thread/13e7d575/#83a1, this can occur when 7zip sees that the zip64 extended field has sections present that were not all 0xFF in the 32 bit fields. Evidently it still moves past those unexpected fields as it will read the archive successfully.