dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

[API Proposal]: Add ZipArchiveEntry.Open overload #101243

Open PaulusParssinen opened 5 months ago

PaulusParssinen commented 5 months ago

Background and motivation

To unblock #1544, in the issue @ericstj suggested adding this method overload, I'll quote his good reasoning/background here:

I've noticed the same. In general ZipArchiveEntry.Open is very non-intuitive in its behavior.

For read only / write only you get a wrapped DeflateStream which doesn't tell you the length of the stream nor permit you to seek it. For read/write (update) ZipArchiveEntry will read and decompress the entire entry into memory (in fact, into a memory stream backed by a single contiguous managed array) so that you have a seek-able representation of the file. Once opened for update the file is then written back to the archive when the archive itself is closed.

I agree with @.qmfrederik here that we need a better API. Rather than rely solely on the archive's open mode we should allow for the individual call's to Open to specify what kind of stream they want. We can then check that against how the archive was opened in case it is incompatible and throw. Consider the addition:

API Proposal

namespace System.IO.Compression;

public partial class ZipArchiveEntry
{
    public Stream Open(FileAccess access);
}

For an archive opened with ZipArchiveMode.Update we could allow FileAccess.Read, FileAccess.Write, or FileAccess.ReadWrite, where only the latter would do the MemoryStream expansion. Read and write would be have as they did today. In addition to solving the OP issue, this would address the case where someone is opening an archive for Update and simply adding a single file: we shouldn't need to copy that uncompressed data into memory just to write it to the archive.

API Usage

For example, the blocked usage in #101238 would with this API be:

using FileStream zipToOpen = new FileStream("release.zip", FileMode.OpenOrCreate);
using ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update);

ZipArchiveEntry readmeEntry = archive.CreateEntry("Readme.txt");
// NEW: Avoid in-memory MemoryStream buffer by specifying the Stream operation is write-only.
using StreamWriter writer = new StreamWriter(readmeEntry.Open(FileAccess.Write));

for (int i = 0; i < 250000000; i++)
{
    writer.WriteLine("Information about this package.");
    writer.WriteLine("========================");
}

Alternative Designs

I believe the original issue had discussion about "fixing" the parameterless overload to change behavior? Didn't read too much into that as it is the "expensive" alternative. This alternative would unblock the indirect OOB consumers though.

Risks

None?

dotnet-policy-service[bot] commented 5 months ago

Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.