Closed jeffhandley closed 2 years ago
Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.
Author: | jeffhandley |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.IO.Compression` |
Milestone: | 7.0.0 |
For the record, PKWare strong encryption requires a license, see https://github.com/dotnet/runtime/issues/1545#issuecomment-566344263
The PKZIP encryption seems to require a license to implement. The WinZip encryption scheme does not.
WinZIP encryption doesn't use the bit 6, it uses an extra data field with header 0x9901.
Now, API-wise, I think bool IsEncrypted
can be a convenient API to just skip the encrypted entries for now, but once we address reading and creation of encrypted entries (which is very likely given the popularity of the issue), we may need another field to describe the AES strength (128, 192, or 256 bit).
My point being that once we extend the encryption support, bool IsEncrypted
field will be redundant (and somewhat obsolete) compared to an enum that tells you more about the encryption information of the entry.
Yet another alternative, similar to your "Exposing the Entry Encryption Level" design, could be:
namespace System.IO.Compression
{
public class ZipArchiveEntry
{
+ public ZipEntryEncryption Encryption { get; }
}
+ public enum ZipEntryEncryption
+ {
+ None,
+ Weak, // ZipCrypto
+ Aes128,
+ Aes192,
+ Aes256,
+ Unknown = 0xFFFF,
+ }
}
Thanks, @Jozkee; I had not read the WinZip documentation you linked to.
With the enum approach, callers that needs to determine if an entry is encrypted would use if (entry.Encryption != ZipEntryEncryption.None)
, which is slightly less ergonomic than if (entry.IsEncrypted)
, but it's still reasonable. We could also consider adding IsEncrypted
as a bool, only storing that bool for now; then when more encryption functionality is added, use IsEncrypted
as a convenience over Encryption != ZipEntryEncryption.None
.
Looks good as proposed. It seems like there's some missing user story around what to do when it's true
, but it does provide incremental value.
namespace System.IO.Compression
{
public class ZipArchiveEntry
{
public bool IsEncrypted { get; }
}
}
// This will no longer occur for encrypted entries
Should we mark this as a breaking-change
? If the InvalidDataException is no longer thrown isn't that one?
// This will no longer occur for encrypted entries
Should we mark this as a
breaking-change
? If the InvalidDataException is no longer thrown isn't that one?
Sorry; that code comment was slightly misleading. This isn't a breaking change; ZipArchiveEntry
will still throw InvalidDataException
, but the code sample shown above will not make the call that would have resulted in that exception. Previously, there was no way for the calling code to know whether or not the contents could be copied out; now the caller can.
Thanks for noticing that detail though and calling it out, @deeprobin.
Background and motivation
By specification, Zip archives support "weak" and "strong" entry-level encryption. Weak encryption of an entry (via the ZipCrypto algorithm) is indicated in the entry-level General Purpose Bit Flag value by setting bit 0 to
1
. Strong encryption (e.g. AES-*) is indicated with both bits 0 and 6 (PKZIP) or by setting bit 0 and using an extra data field (WinZip). With any encryption scheme, General Purpose Bit Flag bit 0 will be set if an entry is encrypted.Our APIs do not currently expose any indicators for whether an archive contains any weak- or strong-encrypted entries, nor do we expose the General Purpose Bit Flag for advanced scenarios that require checking bits within that value. The end result is that calling code cannot determine if an entry is encrypted. Furthermore, when attempting to extract an encrypted entry from a
ZipArchive
, two different behaviors are observed:InvalidDataException
is thrown when attempting to extract the entry. This exception type is also thrown in other circumstances, and there is no discriminating information to indicate the reason for the exception; thus, there is no means for identifying that the entry was encrypted.Issue #57197 requests a means for detecting if a
ZipArchiveEntry
is encrypted while iterating over the entries in the archive.API Proposal
API Usage
Additional APIs to Consider
Expose an Archive-Level Property
We could additionally include an archive-level property to indicate if any of the entries are encrypted. However, since the encryption information is stored at the entry-level "General Purpose Bit Flag" value, populating this property would require iterating over the entries as is shown in the usage example above. If demand for this archive-level property exists, it could be added to
ZipArchive
alongside the existingEntries
andMode
properties. BecauseZipArchive
currently contains no other aggregate information about the entries though, I recommend we do not add this property unless sufficient requests are received after adding the entry-level property.Specify Encryption Handling Behavior to ZipFile Accelerators
The
ZipFile
class provides static accelerator methods, includingOpen
,OpenRead
, andExtractToDirectory
. TheOpen
andOpenRead
methods simply return aZipArchive
without taking any other action and they succeed with weak- or strong-encrypted entries present. TheExtractToDirectory
method exhibits the same behavior as described above, with anInvalidDataException
being thrown for strong-encrypted entries and silently extracting encrypted entries for weak encryption.We could add overloads to
ZipFile.ExtractToDirectory
to specify how encrypted entries should be handled. I recommend we do not pursue this unless sufficient demand emerges. At that time, we could learn more about the scenarios to determine if we need an enum to define different behaviors or if we merely need to allow a boolean to indicate encrypted entries should be skipped. Without these overloads, a caller can instead useZipFile.Open[Read]
and iterate over the entries.Alternative Designs
Exposing the Entry Encryption Algorithm
An alternative design would use an enum to indicate the encryption algorithm used for an entry:
For Add password to ZipArchive · Issue #1545 · dotnet/runtime, an enum along these lines is very likely to be needed. Without fully designing the APIs for those features though, it's possible that the enum illustrated above might not be the right design for the full feature set. Since the enum is not strictly necessary for detecting if there are password-protected/encrypted entries, I recommend we stick with the simple boolean for now. If #1545 is implemented and we end up adding an
Encryption
property that uses an enum similar to what's shown here, then theIsEncrypted
property would become a convenience method over checking ifEncryption
isZipEntryEncryption.None
or another value. For the initial implementation though,IsEncrypted
would only need to be based on General Purpose Bit 0 being set.Exposing the General Purpose Bit Flag Value
We could optionally expose the General Purpose Bit Flag value on
ZipArchiveEntry
, either in addition to or instead ofIsEncrypted
. Exposing this value would allow callers to check other bits to glean more about the archive entry than our APIs support.From there, we could also ostensibly introduce an enum that names each of the bits per the specification. There has not been demand for this low-level data to be exposed however, so I recommend we do not pursue this approach unless demand emerges.
netstandard support via Exception.Data
Some of the requests for this functionality came with the desire for netstandard support. Without adding new API surface, a failed extraction could include data on the exception instance that indicates why the
InvalidDataException
was thrown. For instance,Exception.Data
could include a key/value pair of "IsEncrypted" and the boolean indicating if the entry was encrypted. However, this would rely on exception handling to control flow, which is an anti-pattern.Risks
The Zip file format also includes the ability to encrypt some of the central directory metadata, including file names. This API design does not take that concept into account. Popular tools such as WinZip, 7-Zip, and WinRAR do not support this capability either though. More typically, a password is set for the zip such that it is applied to each of the entries individually, and the central directory metadata remains unencrypted.
Based on the specification for how central directory metadata encryption is implemented, the entry count values are to be obfuscated if the central directory is encrypted (reference). That expectation would likely introduce other issues with the current ZipArchive implementation if we sought to support encryption of the central directory. However, the file entry header General Purpose Bit Flag would remain the same, with bit 0 still indicating that the entry is also encrypted; thus the proposed design would still apply.