dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.73k forks source link

[API Proposal]: Add SubStream type to get a slice of a larger stream #83340

Open jozkee opened 1 year ago

jozkee commented 1 year ago

Background and motivation

It is generally useful to have a Stream wrapper for read/write a specific part of a larger Stream, and it would be beneficial to provide an implementation that can be easily used and improved.

There has been multiple request asking for such a type, see https://stackoverflow.com/a/30494628/4231460. It also came up on the Twitter survey that @adamsitnik posted: https://github.com/dotnet/runtime/issues/58216#issuecomment-1105445805.

Additionally, we currently have similar internal implementations that read a portion of a stream. https://github.com/dotnet/runtime/blob/d3f7d59e8dc44e5ce22c8ec9c011a1b72d7e9e92/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs#L221

https://github.com/dotnet/runtime/blob/d3f7d59e8dc44e5ce22c8ec9c011a1b72d7e9e92/src/libraries/System.Formats.Tar/src/System/Formats/Tar/SubReadStream.cs#L15

API Proposal

namespace System.IO
{
    public sealed class SubStream : Stream
    {
        public SubStream(Stream baseStream, long length, bool canWrite) { }

        public override bool CanRead { get { } }
        public override bool CanSeek { get { } }
        public override bool CanWrite { get { } }
        public override long Length { get { } }
        public override long Position { get { } set { } }
        public override void Flush() { }
        public override int Read(byte[] buffer, int offset, int count) { }
        public override int Read(Span<byte> buffer) { }
        public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { }
        public override async ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { }
        public override int ReadByte() { }
        public override long Seek(long offset, SeekOrigin origin) { }
        public override void SetLength(long value) { }
        public override void Write(byte[] buffer, int offset, int count) { }
        public override void Write(ReadOnlySpan<byte> buffer) { }
        public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { }
        public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { }
        public override void WriteByte(byte value) { }
    }
}

Notes

I considered having a bool leaveOpen flag but it seems to me that most of the time you create a substream, you are not done working with the base stream and hence, you will almost never use leaveOpen = false.

Having the canWrite parameter is useful because you may want to pass the SubStream as read-only where the baseStream supports writing. This is how the SubReadStream in IO.Compression currently works.

I sketched a simple implementation of this proposal and used it to replace internal SubReadStream on System.Formats.Tar and System.IO.Compression. While it could be feasible to completely replace the internal types with this one, they have implementation details that will need to be removed if we want to use this more-generic type. e.g:

API Usage

Pass a subset of your current stream without buffereing.

var subStream = new SubStream(baseStream, length: 100, canWrite: true);
DoSomethingWith(subStream);
// If you want to start at an specific position you need to move it yourself
baseStream.Position += 100;
var subStream = new MySubStream(baseStream, length: 200, canWrite: true);
DoSomethingWith(subStream);

Alternative Design 1

Use a factory e.g:

public static class StreamFactory
{
    // alternative name: Slice
    public static Stream CreateSubStream(Stream baseStream, long length, bool canWrite) { }
}

Alternative Design 2

Having a length parameter for a DelegatingStream.ctor could be an alternative to having a fully-fledged SubStream implementation but that would overload (even more) the proposal in https://github.com/dotnet/runtime/issues/43139.

Risks

I'm not sure how a Seek operation on a SubStream should work. The guideline in the Stream docs is "Seeking to any location beyond the length of the stream is supported". But it appears to me that allowing seeking past the specified lenght may result in a pit of failure, but let me know what you think.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation It is generally useful to have a Stream wrapper for read/write a specific part of a larger Stream, and it would be beneficial to provide an implementation that can be easily used and improved. There has been multiple request asking for such a type, see https://stackoverflow.com/a/30494628/4231460. It also came up on the Twitter survey that @adamsitnik posted: https://github.com/dotnet/runtime/issues/58216#issuecomment-1105445805. Additionally, we currently have similar internal implementations that read a portion of a stream. https://github.com/dotnet/runtime/blob/d3f7d59e8dc44e5ce22c8ec9c011a1b72d7e9e92/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs#L221 https://github.com/dotnet/runtime/blob/d3f7d59e8dc44e5ce22c8ec9c011a1b72d7e9e92/src/libraries/System.Formats.Tar/src/System/Formats/Tar/SubReadStream.cs#L15 ### API Proposal ```cs namespace System.IO { public sealed class SubStream : Stream { public SubStream(Stream baseStream, long length, bool canWrite) { } public override bool CanRead { get { } } public override bool CanSeek { get { } } public override bool CanWrite { get { } } public override long Length { get { } } public override long Position { get { } set { } } public override void Flush() => _baseStream.Flush(); public override int Read(byte[] buffer, int offset, int count) { } public override int Read(Span buffer) { } public override Task ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { } public override async ValueTask ReadAsync(Memory buffer, CancellationToken cancellationToken = default) { } public override int ReadByte() { } public override long Seek(long offset, SeekOrigin origin) { } public override void SetLength(long value) { } public override void Write(byte[] buffer, int offset, int count) { } public override void Write(ReadOnlySpan buffer) { } public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { } public override ValueTask WriteAsync(ReadOnlyMemory buffer, CancellationToken cancellationToken = default) { } public override void WriteByte(byte value) { } } } ``` ### Notes I considered having a `bool leaveOpen` flag but it seems to me that most of the time you create a substream, you are not done working with the base stream and hence, you will almost never use `leaveOpen = false`. Having the `canWrite` parameter is useful because you may want to pass the SubStream as read-only where the baseStream supports writing. This is how the SubReadStream in IO.Compression currently works. I sketched a simple implementation of this proposal and used it to replace internal SubReadStream on System.Formats.Tar and System.IO.Compression. While it could be feasible to completely replace the internal types with this one, they have implementation details that will need to be removed if we want to use this more-generic type. e.g: * SubReadStream from Tar has `HasReachedEnd` property and it throws `EndOfStreamException` when superStream advances to next entry. This is an implementation detail we don't want to expose in this API. * SubReadStream from Zip throws `NotSupportedException` when attempting to write/flush/seek the SubStream (the base stream does support it). Throwing on write can be addressed by the `bool canWrite` param in the ctor but I'm not sure if we want to expose `bool canSeek` and `bool canFlush` as well. ### API Usage Pass a subset of your current stream without buffereing. ```cs var subStream = new SubStream(baseStream, length: 100, canWrite: true); DoSomethingWith(subStream); ``` ### Alternative Designs Having a `length` parameter for a `DelegatingStream.ctor` could be an alternative to having a fully-fledged SubStream implementation but that would overload (even more) the proposal in https://github.com/dotnet/runtime/issues/43139. ### Risks N/A
Author: Jozkee
Assignees: -
Labels: `api-suggestion`, `area-System.IO`
Milestone: -
danmoseley commented 1 year ago

Would it make sense for the constructor to take an offset rather than seeking manually?

Would the wrapper stream verify before each operation that the offset of the underlying stream didn't get moved underneath it? Would that throw?

jozkee commented 1 year ago

Would it make sense for the constructor to take an offset rather than seeking manually?

If we do that, that .ctor would be useless for unseekable streams. Not saying that's a bad thing, though.

Would the wrapper stream verify before each operation that the offset of the underlying stream didn't get moved underneath it?

We could do it or we couldn't, even perhaps we could have a bool arg (ensurePosition) that specifies which behavior a user wants. I looked at BufferedStream to see how it reacts to the base stream moving, and it does not validate it. Same with other streams. Although, the internal SubReadStream used in System.IO.Compression does adjust the position each time Read is called. https://github.com/dotnet/runtime/blob/d3f7d59e8dc44e5ce22c8ec9c011a1b72d7e9e92/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs#L292-L293

jeffhandley commented 1 year ago

We're moving this out to Future and it will be considered again for .NET 9.