dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

[API] StreamWriter .ctor pooling overloads #23486

Open benaadams opened 7 years ago

benaadams commented 7 years ago

Motivation

As StreamWriter requires buffers that live longer than a function lifetime (i.e. buffer lifetime is longer than single entry function scope - class field vs local); there isn't a way to transparently use the ArrayPool without regressing either performance or safety.

As a result of this the option is to make buffer pooling a conscious opt-in choice by the user; so it is safe by default and can be more performant on the opt-in.

Proposal

https://github.com/dotnet/corefx/issues/23874#issuecomment-333624280

public partial class StreamWriter
{
    static int DefaultBufferSize { get; }
    static int DefaultFileBufferSize { get; }
    static Encoding DefaultEncoding { get; }

    StreamWriter(Stream stream,
                 Encoding encoding = DefaultEncoding,
                 int bufferSize = DefaultBufferSize,
                 bool leaveOpen = false,
                 ArrayPool<char> charPool = null,
                 ArrayPool<byte> bytePool = null);
}

Previous proposal

public partial class StreamWriter
{
    StreamWriter(Stream stream, bool useBufferPooling)
    StreamWriter(Stream stream, bool useBufferPooling, Encoding encoding)
    StreamWriter(Stream stream, bool useBufferPooling, Encoding encoding, int bufferSize)
    StreamWriter(Stream stream, bool useBufferPooling, Encoding encoding, int bufferSize, bool leaveOpen)
}

Comment

Similar pattern could be extended to other areas with longer lived buffers, like FileStream etc

/cc @stephentoub @JeremyKuhne @jkotas @KrzysztofCwalina

justinvp commented 7 years ago
  1. Do we really need 5 new overloads? How about a single new overload with one additional parameter at the end, and/or use optional parameters with default values?

  2. Instead of the bool useBufferPooling parameter, another option to consider is having an ArrayPool<byte> pool parameter that would allow someone to pass ArrayPool<byte>.Shared (or some other ArrayPool<byte>) to opt-in to pooling, along the lines of dotnet/runtime#22428. There is a slight usability downside, though: someone may want to use pooling but not know about the existence of ArrayPool<byte>.Shared as the typical instance to pass-in.

  3. What about the ctors that take a path instead of Stream?

benaadams commented 7 years ago

How about a single new overload with one additional parameter at the end and/or use optional parameters with default values?

Can't add optionals with same parameter overloads as it will clash with existing methods; which is why I went for second param.

use optional parameters with default values?

Default Encoding is private UTF8NoBOM; default bufferSizes are private, so would need to be exposed; something like

public partial class StreamWriter
{
    static int DefaultBufferSize { get; }
    static int DefaultFileBufferSize { get; }
    static Encoding DefaultEncoding { get; }

    StreamWriter(Stream stream,
                 bool useBufferPooling,
                 Encoding encoding = DefaultEncoding,
                 int bufferSize = DefaultBufferSize,
                 bool leaveOpen = false);
}

What about the ctors that take a path instead of Stream?

Issue with FileStream not having pooling; if FileStream has pooled overloads, does it pass through pooling params etc. Currently I think it suggests pooling would go all the way down, when it wouldn't.

Instead of the bool useBufferPooling parameter, another option to consider is having an ArrayPool pool

Need to add two pools ArrayPool<byte> and ArrayPool<char> so something like

public partial class StreamWriter
{
    static int DefaultBufferSize { get; }
    static int DefaultFileBufferSize { get; }
    static Encoding DefaultEncoding { get; }

    StreamWriter(Stream stream,
                 Encoding encoding = DefaultEncoding,
                 int bufferSize = DefaultBufferSize,
                 bool leaveOpen = false,
                 ArrayPool<char> charPool = ArrayPool<char>.Shared,
                 ArrayPool<byte> bytePool = ArrayPool<byte>.Shared);
}

Though passing the pools with defaults would definitely clash with existing methods; unless no default for one of the params was taken and order was changed, so it would need to be bufferSize or leaveOpen that had no default rather than Encoding which seems a bit weird.

JeremyKuhne commented 7 years ago

Sorry, I've been out for a bit. I'm keen on the idea of actually having a constructor that takes ArrayPool. Your sample is a reasonable place to start an API review, I think:

public partial class StreamWriter
{
    static int DefaultBufferSize { get; }
    static int DefaultFileBufferSize { get; }
    static Encoding DefaultEncoding { get; }

    StreamWriter(Stream stream,
                 Encoding encoding = DefaultEncoding,
                 int bufferSize = DefaultBufferSize,
                 bool leaveOpen = false,
                 ArrayPool<char> charPool = ArrayPool<char>.Shared,
                 ArrayPool<byte> bytePool = ArrayPool<byte>.Shared);
}

@pjanotti, what do you think?

benaadams commented 7 years ago

Constructor taking ArrayPool means no possibility of devirtualization to the actual Shared pool type; but seems reasonable.

benaadams commented 7 years ago

Updated proposal

benaadams commented 7 years ago

Actually should it be

    StreamWriter(Stream stream,
                 Encoding encoding = DefaultEncoding,
                 int bufferSize = DefaultBufferSize,
                 bool leaveOpen = false,
                 ArrayPool<char> charPool = null,
                 ArrayPool<byte> bytePool = null);

To disable pooling by default?

JeremyKuhne commented 7 years ago

disable pooling by default?

I think so, yes.

JeremyKuhne commented 7 years ago

Note that allowing custom ArrayPool<> implementations enables using the either the default pool or a custom instance. Given that the class is abstract users can fully optimize the behavior- writing a pool implementation that always gives back the same buffer, or one that gives back the same thread local buffer, etc.

pjanotti commented 7 years ago

I like the latest round (single constructor, disabled by default). The main thing that I still see for some debate is if we want pooling disabled by default or not...

Similar pattern could be extended to other areas with longer lived buffers, like FileStream etc

Yes, this an important point, and that should be considered in regards if pooling is enabled/disabled by default (see also dotnet/runtime#22428 mentioned by @justinvp)

benaadams commented 7 years ago

The main thing that I still see for some debate is if we want pooling disabled by default or not...

The choice that seemed to be settled on in the PR and lead to this api was (other than custom pools)

  1. On by default; take a hit on every operation for thread safety in terms of pooled buffer (Interlocked)
  2. Off by default; don't take a hit for thread safety

Either way the type wouldn't be thread safe; and would go wrong for multiple threads; its just whether the use of the pooled array is more safe at a performance cost.

The main issue is the char buffer lifetime as it crosses multiple function calls.

pjanotti commented 7 years ago

I agree with that. I am just trying to make clear that this is the choice being made here.

terrajobst commented 7 years ago

Video

Couple of thoughts:

karelz commented 7 years ago

FYI: The API review discussion was recorded - see https://youtu.be/b96co3sNzNI?t=2549 (53 min duration)

Joe4evr commented 7 years ago

Saw this on the aforementioned discussion recording. If you're concerned with tying to the implementation of ArrayPool<T>, why not abstract it out to a new interface? That seems like the obvious thing to consider, at least.

karelz commented 7 years ago

That would be a way to go. The trick is to create such API and convince yourself that it is flexible enough to cover all/most future requests and needs, incl. isolation of pools, etc. And ensure we avoid unnecessary complexity and overengineering. In general, it is better to wait for a strong use case, which can then drive the design. In this case the suggestion is to use suboptimal bool approach for now and wait for more use cases to drive the "right" design.

danmoseley commented 6 years ago

API complete for 2.1

JeremyKuhne commented 4 years ago

Triage: We need a proposal on an interface that allows providing buffers. Same issue came up with SequenceReader as we want a way to allocate arrays to provide results on the allocating APIs that return spans across sequences.

TrayanZapryanov commented 2 months ago

I've faced same problem when tried to use StreamWriter for populating zip entries. There for each entry stream I am creating new StreamWriter to fill some data. Buffer allocations became one of major type due to this issue. I really hope for some workaround, as I have plans to archive terabytes of data and would like to be as GC friendly as possible.