Open dersia opened 1 year ago
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
Author: | dersia |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.IO`, `untriaged` |
Milestone: | - |
Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.
Author: | dersia |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Memory`, `untriaged` |
Milestone: | - |
I don't understand the proposal. What is the behavior when not copying? Are you proposing storing the Stream object and then reading from it later when the BinaryData is consumed? That seems like a non-starter, as for example it would push a lot of possibly asynchronous and exceptional work into existing members, like the implicit cast to span.
@stephentoub Sorry for the late reply.
Basically yes, I am suggesting to store the Stream object and probably with an additional leaveOpen
argument.
I would like to clarify why I am suggesting this.
The problem that we are having is that we have an http endpoint that takes files. these files need to be uploaded to azure blob-storage and we also need to set the ContentType of those blobs. The Azure SDKs that do upload to Azure Storage Account have multiple overloads, for example one that takes a stream and also one that takes BinaryData. The feature that was added in https://github.com/dotnet/runtime/pull/89605 allows for BinaryData to also set the ContentType during upload and we want to make use of this feature. When using the Stream overload on the BlobClient, we would also need to make another call to update the Blob Properties so we can set the ContentType. Having said that, there is already an overload on BinaryData
which takes a Stream as an argument, the problem with that overload is, that it copies the original stream and this does not work with lager files and we ran into memory exceptions.
As far as I understand the Runtime has already APIs that take Streams and handle correct behavior with streams, so I don't quite understand what would make this different or exceptional in that regard, but then again, you are the expert and I hope you can enlighten me here.
Also the overload that takes BinaryData
with a stream argument as of today is internally creates a MemoryStream
internally and copys the original stream and hase to then work with that stream, which is converted to a span so I don't quite understand why it couldn't do the same with the original stream?
Sorry again for the late reply and thank you, Sia
As far as I understand the Runtime has already APIs that take Streams and handle correct behavior with streams, so I don't quite understand what would make this different or exceptional in that regard
The problem is the rest of the BinaryData surface area assumes the type stores all the data, which then means that methods like ToArray, ToMemory, and the implicit cast to ReadOnlyMemory<T>
would all start doing I/O, not only significantly changing their performance profile, but also leading to I/O that likely should have been done asynchronously now either being done synchronously, or worse, being done asynchronously but blocking (sync over async).
Our aspnet web api backend takes files from the frontend and uploads those to Azure Storage Blobs.
Have you considered using pre-signed blob upload links, send those to the frontend, and have it upload directly to Azure through them?
I know this doesn't directly answer your proposal, but figured I'd mention anyways in case you were not aware of this option which tends to be substantially more efficient than roundtripping all the data twice via an indirection (your API).
cc: @seanmcc-msft
Background and motivation
Our aspnet web api backend takes files from the frontend and uploads those to Azure Storage Blobs.
We use BlobClient.Upload and we also have to call update on the properties to set the content-type of the blob. With the new Media-Type on
BinaryData
(https://github.com/dotnet/runtime/pull/89605) we would like to take advantage of this feature so don't have to make multiple calls on the BlobClient, and in general I think there is benefit in usingBinaryData
overall. But as of todayBinaryData.FromStream()
copies the stream as you can see here https://github.com/dotnet/runtime/blob/f60757ad40a36b44d471f1cb72614368eb51a1fd/src/libraries/System.Memory.Data/src/System/BinaryData.cs#L173-L208Because of this
BinaryData
is not useable for us. So I would like to suggest an API that would not copy the stream.This makes
BinaryData
convenient to use and avoid extra allocation.API Proposal
API Usage
Alternative Designs
Alternatively
FromStream
could be overloaded but I think this would be a source breaking change. Another alternative would be to make FromStream not to copy at all, but I think this will be a breaking behavior and might lead to bugs, where it "just" works for users even if the underlying stream was closed in the meantime.Risks
If called with copyContent false, this could lead to an exception, if the underlying stream is closed, but I think this is something that users are familiar with and with the bool parameter explicitly set to false, users are opting into this behavior and therefor it should be fine, if an exception is thrown.