dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.38k stars 4.75k forks source link

[API Proposal]: Add property to ZipArchiveEntry to indicate if the entry is encrypted #68897

Closed jeffhandley closed 2 years ago

jeffhandley commented 2 years ago

Background and motivation

By specification, Zip archives support "weak" and "strong" entry-level encryption. Weak encryption of an entry (via the ZipCrypto algorithm) is indicated in the entry-level General Purpose Bit Flag value by setting bit 0 to 1. Strong encryption (e.g. AES-*) is indicated with both bits 0 and 6 (PKZIP) or by setting bit 0 and using an extra data field (WinZip). With any encryption scheme, General Purpose Bit Flag bit 0 will be set if an entry is encrypted.

Our APIs do not currently expose any indicators for whether an archive contains any weak- or strong-encrypted entries, nor do we expose the General Purpose Bit Flag for advanced scenarios that require checking bits within that value. The end result is that calling code cannot determine if an entry is encrypted. Furthermore, when attempting to extract an encrypted entry from a ZipArchive, two different behaviors are observed:

  1. For weak (ZipCrypto) encryption, the encrypted data is received as the contents of the entry, and there is no means for identifying that the entry was encrypted. This can lead to invalid or seemingly corrupt data to be passed to the next subsystem.
  2. For strong encryption, an InvalidDataException is thrown when attempting to extract the entry. This exception type is also thrown in other circumstances, and there is no discriminating information to indicate the reason for the exception; thus, there is no means for identifying that the entry was encrypted.

Issue #57197 requests a means for detecting if a ZipArchiveEntry is encrypted while iterating over the entries in the archive.

API Proposal

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public bool IsEncrypted { get; }
    }
}

API Usage

try
{
    if (!entry.IsEncrypted)
    {
        using (var entryStream = entry.Open())
        {
            entryStream.CopyTo(documentStream);
        }
    }
    else
    {
        // App-specific logic for how to handle/log that the zip contained encrypted entries
    }
}
catch (InvalidDataException)
{
    // This will no longer occur for encrypted entries
}

Additional APIs to Consider

Expose an Archive-Level Property

We could additionally include an archive-level property to indicate if any of the entries are encrypted. However, since the encryption information is stored at the entry-level "General Purpose Bit Flag" value, populating this property would require iterating over the entries as is shown in the usage example above. If demand for this archive-level property exists, it could be added to ZipArchive alongside the existing Entries and Mode properties. Because ZipArchive currently contains no other aggregate information about the entries though, I recommend we do not add this property unless sufficient requests are received after adding the entry-level property.

using System.Collections.ObjectModel;

namespace System.IO.Compression
{
    public class ZipArchive
    {
        public ReadOnlyCollection<ZipArchiveEntry> Entries { get; }
        public ZipArchiveMode Mode { get; }
        public bool HasEncryptedEntries { get; } // Access would ensure the entries have been enumerated internally
    }
}

Specify Encryption Handling Behavior to ZipFile Accelerators

The ZipFile class provides static accelerator methods, including Open, OpenRead, and ExtractToDirectory. The Open and OpenRead methods simply return a ZipArchive without taking any other action and they succeed with weak- or strong-encrypted entries present. The ExtractToDirectory method exhibits the same behavior as described above, with an InvalidDataException being thrown for strong-encrypted entries and silently extracting encrypted entries for weak encryption.

We could add overloads to ZipFile.ExtractToDirectory to specify how encrypted entries should be handled. I recommend we do not pursue this unless sufficient demand emerges. At that time, we could learn more about the scenarios to determine if we need an enum to define different behaviors or if we merely need to allow a boolean to indicate encrypted entries should be skipped. Without these overloads, a caller can instead use ZipFile.Open[Read] and iterate over the entries.

Alternative Designs

Exposing the Entry Encryption Algorithm

An alternative design would use an enum to indicate the encryption algorithm used for an entry:

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public ZipEntryEncryption Encryption { get; }
    }

    public enum ZipEntryEncryption 
    {
        None,
        Weak, // ZipCrypto
        Aes128,
        Aes192,
        Aes256,
        Unknown = 0xFFFF,
    }
}

For Add password to ZipArchive · Issue #1545 · dotnet/runtime, an enum along these lines is very likely to be needed. Without fully designing the APIs for those features though, it's possible that the enum illustrated above might not be the right design for the full feature set. Since the enum is not strictly necessary for detecting if there are password-protected/encrypted entries, I recommend we stick with the simple boolean for now. If #1545 is implemented and we end up adding an Encryption property that uses an enum similar to what's shown here, then the IsEncrypted property would become a convenience method over checking if Encryption is ZipEntryEncryption.None or another value. For the initial implementation though, IsEncrypted would only need to be based on General Purpose Bit 0 being set.

Exposing the General Purpose Bit Flag Value

We could optionally expose the General Purpose Bit Flag value on ZipArchiveEntry, either in addition to or instead of IsEncrypted. Exposing this value would allow callers to check other bits to glean more about the archive entry than our APIs support.

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public ushort GeneralPurposeBits { get; }
    }
}

From there, we could also ostensibly introduce an enum that names each of the bits per the specification. There has not been demand for this low-level data to be exposed however, so I recommend we do not pursue this approach unless demand emerges.

netstandard support via Exception.Data

Some of the requests for this functionality came with the desire for netstandard support. Without adding new API surface, a failed extraction could include data on the exception instance that indicates why the InvalidDataException was thrown. For instance, Exception.Data could include a key/value pair of "IsEncrypted" and the boolean indicating if the entry was encrypted. However, this would rely on exception handling to control flow, which is an anti-pattern.

Risks

The Zip file format also includes the ability to encrypt some of the central directory metadata, including file names. This API design does not take that concept into account. Popular tools such as WinZip, 7-Zip, and WinRAR do not support this capability either though. More typically, a password is set for the zip such that it is applied to each of the entries individually, and the central directory metadata remains unencrypted.

Based on the specification for how central directory metadata encryption is implemented, the entry count values are to be obfuscated if the central directory is encrypted (reference). That expectation would likely introduce other issues with the current ZipArchive implementation if we sought to support encryption of the central directory. However, the file entry header General Purpose Bit Flag would remain the same, with bit 0 still indicating that the entry is also encrypted; thus the proposed design would still apply.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation We have received requests to be able to detect if a ZipArchive contains password-protected entries. Without this information, the consumer of a zip file cannot determine if the contents can be extracted ahead of time. During extraction of the entire archive or a single entry, an `InvalidDataException` is thrown if an entry is encrypted, but that exception can be thrown in other circumstances as well. By adding a property to indicate if an entry is encrypted, the entries could be inspected ahead of time and/or skipped during extraction. See https://github.com/dotnet/runtime/issues/57197 for more background on the requests. ### API Proposal ```csharp namespace System.IO.Compression; public class ZipArchiveEntry { public bool IsEncrypted { get; } } ``` ### API Usage ```csharp private bool ArchiveContainsEncryptedEntries(ZipArchive archice) { foreach (var entry in archive.Entries) { if (entry.IsEncrypted) { return true; } } return false; } ``` ### Alternative Designs ### Exposing an Archive-level Property We could consider additionally including an archive-level property to indicate if any of the entries are encrypted. However, populating this property would require iterating over the entries as is shown in the usage example above. Until such an API request is received, I recommend we limit the new API surface to the entry-level property. ### Exposing the Encryption Level Zip archives contain "weak" and "strong" encryption. Weak encryption (via the ZipCrypto algorithm) is indicated on a file with a single bit set within the General Purpose Bit Flags ([bit 0](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT#:~:text=Bit%200%3A%20If%20set%2C%20indicates%20that%20the%20file%20is%20encrypted.)). Strong encryption (e.g. AES-*) is indicated with [both bits 0 and 6](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT#:~:text=Bit%206%3A%20Strong%20encryption) being set in the General Purpose Bit Flags. An alternative design would use an enum to indicate the encryption level of an entry: * None * Weak * Strong Unless we intend to offer APIs in the future that would enable the extraction of encrypted entries, the weak vs. strong indicator is unnecessary. With that consideration, I recommend we use a simple boolean to indicate if the entry is encrypted, regardless of the encryption level or algorithm. And with the boolean approach, we only need to check General Purpose Bit 0, since that bit must be set for both weak and strong encryption. ### netstandard support via Exception.Data Some of the requests for this functionality came with the desire for netstandard support. Without adding new API surface, a failed extraction could include data on the exception instance that indicates why the `InvalidDataException` was thrown. For instance, `Exception.Data` could include a key/value pair of "IsEncrypted" and the boolean indicating if the entry was encrypted. However, this would rely on exception handling to control flow, which is an anti-pattern. ### Risks The Zip file format also includes the ability to encrypt some of the central directory metadata, including file names. This API design does not take that concept into account. Popular tools such as WinZip, 7-Zip, and WinRAR do not support this capability either though. More typically, a password is set for the zip such that it is applied to each of the entries individually, and the central directory metadata remains unencrypted. Based on the specification for how central directory metadata encryption is implemented, the entry count values are to be obfuscated if the central directory is encrypted ([reference](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT#:~:text=7.3.6%20When%20the%20Central%20Directory%20is%20compressed%20and/or%20encrypted)). That expectation would likely introduce other issues with the current ZipArchive implementation if we sought to support encryption of the central directory. However, the file entry header General Purpose Bit Flags would remain the same, with bit 0 still indicating that the entry is also encrypted; thus the proposed design would still apply.
Author: jeffhandley
Assignees: -
Labels: `api-suggestion`, `area-System.IO.Compression`
Milestone: 7.0.0
jozkee commented 2 years ago

For the record, PKWare strong encryption requires a license, see https://github.com/dotnet/runtime/issues/1545#issuecomment-566344263

The PKZIP encryption seems to require a license to implement. The WinZip encryption scheme does not.

WinZIP encryption doesn't use the bit 6, it uses an extra data field with header 0x9901.

Now, API-wise, I think bool IsEncrypted can be a convenient API to just skip the encrypted entries for now, but once we address reading and creation of encrypted entries (which is very likely given the popularity of the issue), we may need another field to describe the AES strength (128, 192, or 256 bit).

My point being that once we extend the encryption support, bool IsEncrypted field will be redundant (and somewhat obsolete) compared to an enum that tells you more about the encryption information of the entry.

Yet another alternative, similar to your "Exposing the Entry Encryption Level" design, could be:


namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
+        public ZipEntryEncryption Encryption { get; }
    }

+    public enum ZipEntryEncryption 
+    {
+        None,
+        Weak, // ZipCrypto
+        Aes128,
+        Aes192,
+        Aes256,
+        Unknown = 0xFFFF,
+    }
}
jeffhandley commented 2 years ago

Thanks, @Jozkee; I had not read the WinZip documentation you linked to.

With the enum approach, callers that needs to determine if an entry is encrypted would use if (entry.Encryption != ZipEntryEncryption.None), which is slightly less ergonomic than if (entry.IsEncrypted), but it's still reasonable. We could also consider adding IsEncrypted as a bool, only storing that bool for now; then when more encryption functionality is added, use IsEncrypted as a convenience over Encryption != ZipEntryEncryption.None.

bartonjs commented 2 years ago

Video

Looks good as proposed. It seems like there's some missing user story around what to do when it's true, but it does provide incremental value.

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public bool IsEncrypted { get; }
    }
}
deeprobin commented 2 years ago

// This will no longer occur for encrypted entries

Should we mark this as a breaking-change? If the InvalidDataException is no longer thrown isn't that one?

jeffhandley commented 2 years ago

// This will no longer occur for encrypted entries

Should we mark this as a breaking-change? If the InvalidDataException is no longer thrown isn't that one?

Sorry; that code comment was slightly misleading. This isn't a breaking change; ZipArchiveEntry will still throw InvalidDataException, but the code sample shown above will not make the call that would have resulted in that exception. Previously, there was no way for the calling code to know whether or not the contents could be copied out; now the caller can.

Thanks for noticing that detail though and calling it out, @deeprobin.