Open benaadams opened 7 years ago
Do we really need 5 new overloads? How about a single new overload with one additional parameter at the end, and/or use optional parameters with default values?
Instead of the bool useBufferPooling
parameter, another option to consider is having an ArrayPool<byte> pool
parameter that would allow someone to pass ArrayPool<byte>.Shared
(or some other ArrayPool<byte>
) to opt-in to pooling, along the lines of dotnet/runtime#22428. There is a slight usability downside, though: someone may want to use pooling but not know about the existence of ArrayPool<byte>.Shared
as the typical instance to pass-in.
What about the ctors that take a path instead of Stream
?
How about a single new overload with one additional parameter at the end and/or use optional parameters with default values?
Can't add optionals with same parameter overloads as it will clash with existing methods; which is why I went for second param.
use optional parameters with default values?
Default Encoding
is private UTF8NoBOM
; default bufferSizes
are private, so would need to be exposed; something like
public partial class StreamWriter
{
static int DefaultBufferSize { get; }
static int DefaultFileBufferSize { get; }
static Encoding DefaultEncoding { get; }
StreamWriter(Stream stream,
bool useBufferPooling,
Encoding encoding = DefaultEncoding,
int bufferSize = DefaultBufferSize,
bool leaveOpen = false);
}
What about the ctors that take a path instead of Stream?
Issue with FileStream
not having pooling; if FileStream
has pooled overloads, does it pass through pooling params etc. Currently I think it suggests pooling would go all the way down, when it wouldn't.
Instead of the bool useBufferPooling parameter, another option to consider is having an ArrayPool
pool
Need to add two pools ArrayPool<byte>
and ArrayPool<char>
so something like
public partial class StreamWriter
{
static int DefaultBufferSize { get; }
static int DefaultFileBufferSize { get; }
static Encoding DefaultEncoding { get; }
StreamWriter(Stream stream,
Encoding encoding = DefaultEncoding,
int bufferSize = DefaultBufferSize,
bool leaveOpen = false,
ArrayPool<char> charPool = ArrayPool<char>.Shared,
ArrayPool<byte> bytePool = ArrayPool<byte>.Shared);
}
Though passing the pools with defaults would definitely clash with existing methods; unless no default for one of the params was taken and order was changed, so it would need to be bufferSize
or leaveOpen
that had no default rather than Encoding
which seems a bit weird.
Sorry, I've been out for a bit. I'm keen on the idea of actually having a constructor that takes ArrayPool
. Your sample is a reasonable place to start an API review, I think:
public partial class StreamWriter
{
static int DefaultBufferSize { get; }
static int DefaultFileBufferSize { get; }
static Encoding DefaultEncoding { get; }
StreamWriter(Stream stream,
Encoding encoding = DefaultEncoding,
int bufferSize = DefaultBufferSize,
bool leaveOpen = false,
ArrayPool<char> charPool = ArrayPool<char>.Shared,
ArrayPool<byte> bytePool = ArrayPool<byte>.Shared);
}
@pjanotti, what do you think?
Constructor taking ArrayPool
means no possibility of devirtualization to the actual Shared pool type; but seems reasonable.
Updated proposal
Actually should it be
StreamWriter(Stream stream,
Encoding encoding = DefaultEncoding,
int bufferSize = DefaultBufferSize,
bool leaveOpen = false,
ArrayPool<char> charPool = null,
ArrayPool<byte> bytePool = null);
To disable pooling by default?
disable pooling by default?
I think so, yes.
Note that allowing custom ArrayPool<> implementations enables using the either the default pool or a custom instance. Given that the class is abstract users can fully optimize the behavior- writing a pool implementation that always gives back the same buffer, or one that gives back the same thread local buffer, etc.
I like the latest round (single constructor, disabled by default). The main thing that I still see for some debate is if we want pooling disabled by default or not...
Similar pattern could be extended to other areas with longer lived buffers, like FileStream etc
Yes, this an important point, and that should be considered in regards if pooling is enabled/disabled by default (see also dotnet/runtime#22428 mentioned by @justinvp)
The main thing that I still see for some debate is if we want pooling disabled by default or not...
The choice that seemed to be settled on in the PR and lead to this api was (other than custom pools)
Either way the type wouldn't be thread safe; and would go wrong for multiple threads; its just whether the use of the pooled array is more safe at a performance cost.
The main issue is the char
buffer lifetime as it crosses multiple function calls.
I agree with that. I am just trying to make clear that this is the choice being made here.
Couple of thoughts:
pool
because it allows us to change the implementationFYI: The API review discussion was recorded - see https://youtu.be/b96co3sNzNI?t=2549 (53 min duration)
Saw this on the aforementioned discussion recording. If you're concerned with tying to the implementation of ArrayPool<T>
, why not abstract it out to a new interface? That seems like the obvious thing to consider, at least.
That would be a way to go. The trick is to create such API and convince yourself that it is flexible enough to cover all/most future requests and needs, incl. isolation of pools, etc. And ensure we avoid unnecessary complexity and overengineering.
In general, it is better to wait for a strong use case, which can then drive the design. In this case the suggestion is to use suboptimal bool
approach for now and wait for more use cases to drive the "right" design.
API complete for 2.1
Triage: We need a proposal on an interface that allows providing buffers. Same issue came up with SequenceReader
as we want a way to allocate arrays to provide results on the allocating APIs that return spans across sequences.
I've faced same problem when tried to use StreamWriter for populating zip entries. There for each entry stream I am creating new StreamWriter to fill some data. Buffer allocations became one of major type due to this issue. I really hope for some workaround, as I have plans to archive terabytes of data and would like to be as GC friendly as possible.
Motivation
As
StreamWriter
requires buffers that live longer than a function lifetime (i.e. buffer lifetime is longer than single entry function scope - class field vs local); there isn't a way to transparently use theArrayPool
without regressing either performance or safety.As a result of this the option is to make buffer pooling a conscious opt-in choice by the user; so it is safe by default and can be more performant on the opt-in.
Proposal
https://github.com/dotnet/corefx/issues/23874#issuecomment-333624280
Previous proposal
Comment
Similar pattern could be extended to other areas with longer lived buffers, like
FileStream
etc/cc @stephentoub @JeremyKuhne @jkotas @KrzysztofCwalina