aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.07k stars 862 forks source link

TransferUtil.UploadAsync does not work with unseekable streams or zero length streams #675

Closed NinoFloris closed 1 year ago

NinoFloris commented 7 years ago

Expected Behavior

Things should behave, I should be able to pass through a stream that has no length / zero length and at least get decent exceptions, preferably it should be handled for the user. Any other scenario requires us to buffer the full upload in a stream that can do it all (seeking, accurate report of length etc) before we can give the stream to the s3 sdk, not nice for using it in a webserver and feels like we have to babysit the aws sdk so it behaves...

Current Behavior

Request terminates with NotSupported exception as S3 tries to access the length somewhere without catching the exception. The stacktrace is posted under Stacktrace for length NotSupported exception:

Steps to Reproduce (for bugs)

Create aspnet project, add a post endpoint for a simple upload transfer (just binary, no form data),

What also helps, as a non seekable stream is not always practical, is to add this line, this is a helper that spools the request body to disk once it reaches a threshold

HttpContext.Request.EnableRewind(3145728); // 3 MB max in mem before spooling to disk

Problems is that with that method the S3 client hangs and finally responds/times-out with the exception seen at the bottom Stacktrace for length 0 case:

Your Environment

Product Information: Version: 1.0.4 Commit SHA-1 hash: af1e6684fd

Runtime Environment: OS Name: Mac OS X OS Version: 10.12 OS Platform: Darwin RID: osx.10.12-x64 Base Path: /usr/local/share/dotnet/sdk/1.0.4


## Context:
Most streams don't allow you to set length, and both of these streams can't do length tracking in a reliable way as they sometimes don't know it yet (think chunked requests)
- FrameRequestStream https://github.com/aspnet/KestrelHttpServer/blob/dev/src/Microsoft.AspNetCore.Server.Kestrel.Core/Internal/Http/FrameRequestStream.cs
- FileBufferingReadStream https://github.com/aspnet/HttpAbstractions/blob/master/src/Microsoft.AspNetCore.WebUtilities/FileBufferingReadStream.cs

these streams are no exception to that, that is why you probably put quite a few consumers of your sdk in a tough spot by trying to read out a ContentLength for your own S3 request from the stream. That should be re-evaluated, either by allowing us to pass in a ContentLength ourselves somewhere or doing something like a chunked upload so you don't need to set a Content-Length header and thus have no need for a ContentLength from the stream.  

Also apparently something is completely blowing up if the stream length that was passed in is 0 as I could easily achieve, that should be something to check out irrespective of your decision or the outcome of this issue.

## Stacktrace for length NotSupported exception:
  An unhandled exception has occurred: Specified method is not supported.

System.NotSupportedException: Specified method is not supported. at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.FrameRequestStream.get_Length() at Amazon.S3.Transfer.TransferUtilityUploadRequest.get_ContentLength() at Amazon.S3.Transfer.TransferUtility.IsMultipartUpload(TransferUtilityUploadRequest request) at Amazon.S3.Transfer.TransferUtility.GetUploadCommand(TransferUtilityUploadRequest request, SemaphoreSlim asyncThrottler) at Amazon.S3.Transfer.TransferUtility.UploadAsync(Stream stream, String bucketName, String key, CancellationToken cancellationToken) ... our code ...


## Stacktrace for length 0 case (it took many seconds before it responds with this error):
  An unhandled exception has occurred: The provided 'x-amz-content-sha256' header does not match what was computed.

Amazon.S3.AmazonS3Exception: The provided 'x-amz-content-sha256' header does not match what was computed. ---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown. at Amazon.Runtime.HttpWebRequestMessage.d20.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler_mobile\HttpRequestMessageFactory.cs:line 404 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.HttpHandler`1.d91.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler\HttpHandler.cs:line 175 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.RedirectHandler.<InvokeAsync>d__11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.Unmarshaller.d3`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.S3.Internal.AmazonS3ResponseHandler.d11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.ErrorHandler.<InvokeAsync>d__51.MoveNext() --- End of inner exception stack trace --- at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleException(IExecutionContext executionContext, HttpErrorResponseException exception) in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\HttpErrorResponseExceptionHandler.cs:line 60 at Amazon.Runtime.Internal.ErrorHandler.ProcessException(IExecutionContext executionContext, Exception exception) in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\ErrorHandler.cs:line 212 at Amazon.Runtime.Internal.ErrorHandler.d5`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\ErrorHandler.cs:line 104 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.CallbackHandler.d91.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Amazon.Runtime.Internal.RetryHandler.<InvokeAsync>d__101.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\RetryHandler\RetryHandler.cs:line 131 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.CallbackHandler.d9`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.CallbackHandler.d91.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.S3.Internal.AmazonS3ExceptionHandler.<InvokeAsync>d__11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.ErrorCallbackHandler.d5`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\Handlers\ErrorCallbackHandler.cs:line 58 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.MetricsHandler.d1`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.S3.Transfer.Internal.SimpleUploadCommand.d__10.MoveNext() --- End of stack trace from previous location where exception was thrown --- ... our code ...

sstevenkang commented 7 years ago

Is there a reason why you are not making a direct put object request? Transfer utility is for concurrent multipart upload. Also, if you could provide sample code, that might help us understand the use case.

NinoFloris commented 7 years ago

Yes because it is in our filesystem abstraction layer which also deals with bigger files, sometimes from disk. And so Transfer utility is (conceptually) a good fit for us.

I'll see if I can whip up something with aspnet core abstractions so it's small and clear

NinoFloris commented 7 years ago

Here you go https://github.com/NinoFloris/awsrepro675

EDIT: Just tested the FileBufferingReadStream and a non seekable stream with PutObjectRequest as well, same exact errors

NinoFloris commented 7 years ago

@sstevenkang as you can read those errors are not only with Transfer utility but also present for PutObjectRequest, I hope this warrants a different label than "Question"

normj commented 7 years ago

I understand your use case but S3 has always had the requirement that you must set the content-length whenever you do a PUT object. Here is the S3 docs which point out Content-Length being required

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

We also compute the SHA256 of the object which is passed as a header for data integrity.

This team and repo doesn't control how the S3 API works. Just how it is interfaced with .NET. If you want dive deeper into S3's design I suggest you reach out on the S3 forums.

https://forums.aws.amazon.com/forum.jspa?forumID=24

NinoFloris commented 7 years ago

The problem is not the fact that we don't know the Content-Length but that the sdk assumes the stream actually has and knows that length. This assumption is wrong and the behavior unidiomatic. There should be a way to give a content length to the sdk as an argument separate from the stream. Currently as far as I could tell most of this is internal and just pokes the stream for its length.

sstevenkang commented 7 years ago

I am a little confused by the comment and also the comment in sample code:

There should be a way to give a content length to the sdk as an argument separate from the stream. Every request stream type out there will not be able to seek or give length

If the content length is known, why is the stream not seekable? That's what I am having hard time following. I can understand the argument that the S3 operations should support unseekable stream--which we don't due to performance reasons, but to say that it's more idiomatic for a function to take an unseekable stream and the size of the stream as parameters... seems odd.

NinoFloris commented 7 years ago

Because of the same reason ;) kestrel (.net core web server) for instance doesn't do buffering for performance reasons. And the buffering stream that I linked to also doesn't report it due to unreliable internals (it only starts buffering after the first read for instance) It is just something that is hard to rely on, however we always know the file size or content length from our incoming request to pass a valid content-length to the sdk.

We just don't always have control over the type of input streams. If you think it shouldn't be in the sdk we'll just have to pass some wrapper stream that takes a stream and a length, an unnecessary allocation but so be it.

Besides all this, a content-length != a stream length, these two could have completely separate meanings and should be able to be set separately from each other.

tapmantwo commented 7 years ago

We suffer the same problem - particularly if we are dealing with encrypted or compressed streams. We can't read the size from the stream, or seek. We have a kinda workaround which is if we use the simple PUT object for these types AND are told up front how big the file is we can support them - but obviously the PUT request has a max size limit of 5GB which isn't ideal. A solution for multi-part uploads would help us significantly.

Note, we are also behind an abstraction, and S3 is only ONE of the concrete implementations.

sstevenkang commented 7 years ago

I've added this as a feature request in our backlog. We'll report back once we have more news regarding this work. I appreciate the the detailed report, repro, and the discussion.

jarroda commented 6 years ago

I hope this issue is still in the backlog, and that there are still intentions of resolving it. We hit the same problem.

tburnett80 commented 5 years ago

Has there been any traction on this? Its been almost 2 years.

NinoFloris commented 5 years ago

We gave up waiting and now use Minio https://github.com/minio/minio which supports not only specifying a length separately in the upload call, but also supports length -1 meaning "stream it out, I don't know how big it'll be".

Aspnet or IIS willl protect you with a maximum body size limit for gigantic uploads anyway. This is perfect for APIs that want to proxy uploads after validation and upload destination is retrieved from config.

TommyN commented 5 years ago

We have a similar use case where we get files from a 3rd party through http that we want to upload to S3.

If you think it shouldn't be in the sdk we'll just have to pass some wrapper stream that takes a stream and a length, an unnecessary allocation but so be it.

I'm testing with a wrapper stream now and it seems to work. I'm assuming the Content-Length from the http-server is correct. Any idea if this will crash and burn at some point?

using (var response = await httpClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead))
{
    var length = response.Content.Headers.ContentLength;
    using (var stream = await response.Content.ReadAsStreamAsync())
    using (var wrapper = new StreamingStreamWrapper(stream, length.Value))
    using (var fileTransferUtility = new TransferUtility(s3Client))
    {
        await fileTransferUtility.UploadAsync(wrapper, bucket, key);
    }
}

StreamingStreamWrapper:

    internal class StreamingStreamWrapper : Stream
    {
        private Stream stream;
        private long length;
        private long readCount = 0;

        public StreamingStreamWrapper(Stream stream, long length)
        {
            this.stream = stream;
            this.length = length;
        }

        public override bool CanRead => true;

        public override bool CanSeek => true;

        public override long Length => length;

        public override long Position { get => readCount; set => throw new System.NotImplementedException(); }

        public override int Read(byte[] buffer, int offset, int count)
        {
            var bytes = stream.Read(buffer, offset, count);            

            readCount += bytes;
            return bytes;
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            return 0;
        }
p-selivanov commented 5 years ago

@TommyN I came up with a similar wrapper stream approach. In my case it worked fine. Although when I checked my app memory consumption for big files (more than 100 MB) it turned out that TransferUtility buffers the whole input stream into memory!!!

After some search I found this page and tried your approach. The result is the same. Works but does full buffering internally. So we could just use full read to MemoryStream and no wrappers.

p-selivanov commented 5 years ago

BTW our use case is a follows:

  1. It is ASP.NET Core API
  2. API accepts a request stream. Images, media files (sometimes big)
  3. We do some internal logic with data from headers/cookies and figure out the target bucket name and key. We don't read the body and avoid full buffering.
  4. After that we forward the request body to S3. And would be happy if S3 SDK could do the upload also without full buffering.
tburnett80 commented 5 years ago

@p-selivanov I am doing the same thing, using Azure blob storage API's I could just pass the stream down the line without having to first read it to get the length first.

I switched to Minio because it allows the -1 to 'stream out' the content. Its just surprising given how large of a market share AWS has that they cant handle non-seek-able streams in a graceful manner. They continue making Azure a better choice for dotnet developers.

aron-truvian commented 5 years ago

I'm running into this very same problem, and it's disappointing that Amazon hasn't given a solution for this yet.

My use case is that I'm taking zip files from one S3 bucket, and decompressing them into another S3 bucket via a lambda function. However, this won't work, since the stream(s) from the zip archive are not seekable.

Svisstack commented 4 years ago

Same problem but in my case I need to calculate the SHA512 on the fly so I need to wrap the FileStream into CryptoStream which is not seekable of course (as it calculating the SHA512 in the fly)

normj commented 4 years ago

For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:

https://github.com/aws-samples/cloudmosaic-dotnet/blob/reinvent-2019/Application/Clients/CloudMosaic.BlazorFrontend/FileUploader.cs#L37

I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.

NinoFloris commented 4 years ago

@normj nice! How is that even a question?

oliverjanik commented 4 years ago

What's the status here? it's been another 10 months. This is an absolute must have.

S3 is the service to store large files, right?

Why does this API not allow a simple unseekable stream? I want to pass incoming request body (of my API) as a stream directly into S3.

Alternatively, I wouldn't mind if you exposed an API where you open a writeable stream for me that I copy data into. This would be even cleaner. Maybe like this:

            using var stream = await s3.OpenPutObjectAsync(new OpenPutObjectRequest
            {
                BucketName = bucketName,
                Key = key,
                ContentType = contentType,
                ContentLength = contentLength,
            });

           // Let me write whatever I want.

At this point in time you're forcing me to to either:

mrogunlana commented 3 years ago

For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:

https://github.com/aws-samples/cloudmosaic-dotnet/blob/reinvent-2019/Application/Clients/CloudMosaic.BlazorFrontend/FileUploader.cs#L37

I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.

@normj this implementation works well; I think you should put this into an SDK

Svisstack commented 3 years ago

We are using similar algorithm, I agree that this should be in the sdk.

Sent from my iPhone

On 12 Jan 2021, at 21:35, Diran Ogunlana notifications@github.com wrote:

 For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:

https://github.com/aws-samples/cloudmosaic-dotnet/blob/reinvent-2019/Application/Clients/CloudMosaic.BlazorFrontend/FileUploader.cs#L37

I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.

@normj this implementation works well; I think you should put this into an SDK

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

AlainH commented 2 years ago

I'm looking forward to seeing this implemented

dave-yotta commented 1 year ago

Just to chime in: @sstevenkang

If the content length is known, why is the stream not seekable?

On a very basic level, a HTTP headers request will return a Content-Length before it begins sending the request body. The content length is always known before the entire body is available. The only way for such a stream to be seekable is to buffer the data either in memory or on disk - that is a dangerous game to play.

jpestanota commented 1 year ago

FYI, just in case it helps, I got a pointer to this tool and examples and they seem promising: https://github.com/mlhpdx/s3-upload-stream

dave-yotta commented 1 year ago

This is exactly what we've done, and with the 5mb buffer uploads are significantly slower. High values risk memory usage issues.

This example buffers the data in memory, introducing an intermediate buffer between the input buffer (e.g. a incoming web request) and the output buffer (a outgoing web request) - it would be better to rely on the internal buffers of the underlying .NET HttpRequest made by the aws SDK to size the buffer according to System.Net.Http decides is appropriate given network conditions.

Even better, a System.IO.Pipelines implementation could eliminate all unnecessary buffers down to a minimum of 1. But that would require AWS to actually put some work into thier SDK, or someone to code this up against thier rest APIs.

peswallstreet commented 1 year ago

The same thing. Trying to upload 36 mb file using TransferUtil.UploadAsync, does not work. When I switched to non async version, everything works. Seriously guys, what a hell? Is this a joke or amazon can't handle async/await?

Code is simple as it can be: Working solution:

public class S3Manager
    {
        private static TransferUtility TransferUtil;
        static S3Manager()
        {
            AmazonS3Client amazonS3Client = new ();
            TransferUtil = new TransferUtility(amazonS3Client);

        }
        public static async Task UploadFile(string filePath)
        {
            var bucketName = "absa.bucket";

            TransferUtil.Upload(new TransferUtilityUploadRequest
            {
                BucketName = bucketName,
                FilePath = filePath
            });

            await Task.FromResult(0);
        }
    }

Does not work:

public class S3Manager
    {
        private static TransferUtility TransferUtil;
        static S3Manager()
        {
            AmazonS3Client amazonS3Client = new ();
            TransferUtil = new TransferUtility(amazonS3Client);

        }
        public static async Task UploadFile(string filePath)
        {
            var bucketName = "absa.bucket";

            await TransferUtil.UploadAsync(new TransferUtilityUploadRequest
            {
                BucketName = bucketName,
                FilePath = filePath
            });

        }
    }

additionally found it https://stackoverflow.com/questions/59986035/aws-s3-transferutility-uploadasync-succeeds-but-does-not-create-file

normj commented 1 year ago

@peswallstreet I used both of your code snippets and they worked fine for me. The only way I can reproduce what you are saying is if my code that is calling the S3Manager.UploadFile does not do an await.

dave-yotta commented 1 year ago

I dunno what the last two comments are related to, but the issue reported here is related to non-seekable input streams.

Nijasbijilyrawther commented 1 year ago

Where are we on this ?

We have a use-case where we do a httpclient.GetStreamAsync() and pass that stream to the uploadObjectAsync endpoint. This is the only useful endpoint, which actually doesnt store the entire content into memory. And we are dealing with the download of attachments that could be in many GBs as well.

While it works flawlessly for GCS (google cloud storage) using their UploadObjectAsync endpoint, this whole "cannot read content length" and limitations with non-seekable streams are giving us a hard time.

We really dont have an option to take everything into memory since we run in container environments with restricted resources and dont really have a provision to assign/attach data volumes.

Do we have any work-around for this scenario ?

jecc1982 commented 1 year ago

@Nijasbijilyrawther you can try this: https://github.com/jasonterando/S3BufferedUpload

Or https://github.com/mlhpdx/s3-upload-stream which was suggested in the previous comment.

peterrsongg commented 1 year ago

This was queued up as a Priority 2 item internally so hasn't gotten much attention, but this feature request has been implemented and will go out in our next manual release. Appreciate the patience and will post here when the feature is out.

peterrsongg commented 1 year ago

@jecc1982 @Nijasbijilyrawther @NinoFloris @dave-yotta @jpestanota @AlainH @tapmantwo @Svisstack @tburnett80 @TommyN @peswallstreet @mrogunlana @p-selivanov @aron-truvian

This feature has been released in S3 Version 3.7.202. I'm going to leave this issue open for a couple more days just in case there are any issues.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.