Closed NinoFloris closed 1 year ago
Is there a reason why you are not making a direct put object request? Transfer utility is for concurrent multipart upload. Also, if you could provide sample code, that might help us understand the use case.
Yes because it is in our filesystem abstraction layer which also deals with bigger files, sometimes from disk. And so Transfer utility is (conceptually) a good fit for us.
I'll see if I can whip up something with aspnet core abstractions so it's small and clear
Here you go https://github.com/NinoFloris/awsrepro675
EDIT: Just tested the FileBufferingReadStream and a non seekable stream with PutObjectRequest as well, same exact errors
@sstevenkang as you can read those errors are not only with Transfer utility but also present for PutObjectRequest, I hope this warrants a different label than "Question"
I understand your use case but S3 has always had the requirement that you must set the content-length whenever you do a PUT object. Here is the S3 docs which point out Content-Length being required
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
We also compute the SHA256 of the object which is passed as a header for data integrity.
This team and repo doesn't control how the S3 API works. Just how it is interfaced with .NET. If you want dive deeper into S3's design I suggest you reach out on the S3 forums.
The problem is not the fact that we don't know the Content-Length but that the sdk assumes the stream actually has and knows that length. This assumption is wrong and the behavior unidiomatic. There should be a way to give a content length to the sdk as an argument separate from the stream. Currently as far as I could tell most of this is internal and just pokes the stream for its length.
I am a little confused by the comment and also the comment in sample code:
There should be a way to give a content length to the sdk as an argument separate from the stream. Every request stream type out there will not be able to seek or give length
If the content length is known, why is the stream not seekable? That's what I am having hard time following. I can understand the argument that the S3 operations should support unseekable stream--which we don't due to performance reasons, but to say that it's more idiomatic for a function to take an unseekable stream and the size of the stream as parameters... seems odd.
Because of the same reason ;) kestrel (.net core web server) for instance doesn't do buffering for performance reasons. And the buffering stream that I linked to also doesn't report it due to unreliable internals (it only starts buffering after the first read for instance) It is just something that is hard to rely on, however we always know the file size or content length from our incoming request to pass a valid content-length to the sdk.
We just don't always have control over the type of input streams. If you think it shouldn't be in the sdk we'll just have to pass some wrapper stream that takes a stream and a length, an unnecessary allocation but so be it.
Besides all this, a content-length != a stream length, these two could have completely separate meanings and should be able to be set separately from each other.
We suffer the same problem - particularly if we are dealing with encrypted or compressed streams. We can't read the size from the stream, or seek. We have a kinda workaround which is if we use the simple PUT object for these types AND are told up front how big the file is we can support them - but obviously the PUT request has a max size limit of 5GB which isn't ideal. A solution for multi-part uploads would help us significantly.
Note, we are also behind an abstraction, and S3 is only ONE of the concrete implementations.
I've added this as a feature request in our backlog. We'll report back once we have more news regarding this work. I appreciate the the detailed report, repro, and the discussion.
I hope this issue is still in the backlog, and that there are still intentions of resolving it. We hit the same problem.
Has there been any traction on this? Its been almost 2 years.
We gave up waiting and now use Minio https://github.com/minio/minio which supports not only specifying a length separately in the upload call, but also supports length -1 meaning "stream it out, I don't know how big it'll be".
Aspnet or IIS willl protect you with a maximum body size limit for gigantic uploads anyway. This is perfect for APIs that want to proxy uploads after validation and upload destination is retrieved from config.
We have a similar use case where we get files from a 3rd party through http that we want to upload to S3.
If you think it shouldn't be in the sdk we'll just have to pass some wrapper stream that takes a stream and a length, an unnecessary allocation but so be it.
I'm testing with a wrapper stream now and it seems to work. I'm assuming the Content-Length from the http-server is correct. Any idea if this will crash and burn at some point?
using (var response = await httpClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead))
{
var length = response.Content.Headers.ContentLength;
using (var stream = await response.Content.ReadAsStreamAsync())
using (var wrapper = new StreamingStreamWrapper(stream, length.Value))
using (var fileTransferUtility = new TransferUtility(s3Client))
{
await fileTransferUtility.UploadAsync(wrapper, bucket, key);
}
}
StreamingStreamWrapper:
internal class StreamingStreamWrapper : Stream
{
private Stream stream;
private long length;
private long readCount = 0;
public StreamingStreamWrapper(Stream stream, long length)
{
this.stream = stream;
this.length = length;
}
public override bool CanRead => true;
public override bool CanSeek => true;
public override long Length => length;
public override long Position { get => readCount; set => throw new System.NotImplementedException(); }
public override int Read(byte[] buffer, int offset, int count)
{
var bytes = stream.Read(buffer, offset, count);
readCount += bytes;
return bytes;
}
public override long Seek(long offset, SeekOrigin origin)
{
return 0;
}
@TommyN I came up with a similar wrapper stream approach. In my case it worked fine. Although when I checked my app memory consumption for big files (more than 100 MB) it turned out that TransferUtility buffers the whole input stream into memory!!!
After some search I found this page and tried your approach. The result is the same. Works but does full buffering internally. So we could just use full read to MemoryStream and no wrappers.
BTW our use case is a follows:
@p-selivanov I am doing the same thing, using Azure blob storage API's I could just pass the stream down the line without having to first read it to get the length first.
I switched to Minio because it allows the -1 to 'stream out' the content. Its just surprising given how large of a market share AWS has that they cant handle non-seek-able streams in a graceful manner. They continue making Azure a better choice for dotnet developers.
I'm running into this very same problem, and it's disappointing that Amazon hasn't given a solution for this yet.
My use case is that I'm taking zip files from one S3 bucket, and decompressing them into another S3 bucket via a lambda function. However, this won't work, since the stream(s) from the zip archive are not seekable.
Same problem but in my case I need to calculate the SHA512 on the fly so I need to wrap the FileStream into CryptoStream which is not seekable of course (as it calculating the SHA512 in the fly)
For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:
I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.
@normj nice! How is that even a question?
What's the status here? it's been another 10 months. This is an absolute must have.
S3 is the service to store large files, right?
Why does this API not allow a simple unseekable stream? I want to pass incoming request body (of my API) as a stream directly into S3.
Alternatively, I wouldn't mind if you exposed an API where you open a writeable stream for me that I copy data into. This would be even cleaner. Maybe like this:
using var stream = await s3.OpenPutObjectAsync(new OpenPutObjectRequest
{
BucketName = bucketName,
Key = key,
ContentType = contentType,
ContentLength = contentLength,
});
// Let me write whatever I want.
At this point in time you're forcing me to to either:
For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:
I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.
@normj this implementation works well; I think you should put this into an SDK
We are using similar algorithm, I agree that this should be in the sdk.
Sent from my iPhone
On 12 Jan 2021, at 21:35, Diran Ogunlana notifications@github.com wrote:
For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:
I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.
@normj this implementation works well; I think you should put this into an SDK
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I'm looking forward to seeing this implemented
Just to chime in: @sstevenkang
If the content length is known, why is the stream not seekable?
On a very basic level, a HTTP headers request will return a Content-Length before it begins sending the request body. The content length is always known before the entire body is available. The only way for such a stream to be seekable is to buffer the data either in memory or on disk - that is a dangerous game to play.
FYI, just in case it helps, I got a pointer to this tool and examples and they seem promising: https://github.com/mlhpdx/s3-upload-stream
This is exactly what we've done, and with the 5mb buffer uploads are significantly slower. High values risk memory usage issues.
This example buffers the data in memory, introducing an intermediate buffer between the input buffer (e.g. a incoming web request) and the output buffer (a outgoing web request) - it would be better to rely on the internal buffers of the underlying .NET HttpRequest made by the aws SDK to size the buffer according to System.Net.Http decides is appropriate given network conditions.
Even better, a System.IO.Pipelines implementation could eliminate all unnecessary buffers down to a minimum of 1. But that would require AWS to actually put some work into thier SDK, or someone to code this up against thier rest APIs.
The same thing. Trying to upload 36 mb file using TransferUtil.UploadAsync, does not work. When I switched to non async version, everything works. Seriously guys, what a hell? Is this a joke or amazon can't handle async/await?
Code is simple as it can be: Working solution:
public class S3Manager
{
private static TransferUtility TransferUtil;
static S3Manager()
{
AmazonS3Client amazonS3Client = new ();
TransferUtil = new TransferUtility(amazonS3Client);
}
public static async Task UploadFile(string filePath)
{
var bucketName = "absa.bucket";
TransferUtil.Upload(new TransferUtilityUploadRequest
{
BucketName = bucketName,
FilePath = filePath
});
await Task.FromResult(0);
}
}
Does not work:
public class S3Manager
{
private static TransferUtility TransferUtil;
static S3Manager()
{
AmazonS3Client amazonS3Client = new ();
TransferUtil = new TransferUtility(amazonS3Client);
}
public static async Task UploadFile(string filePath)
{
var bucketName = "absa.bucket";
await TransferUtil.UploadAsync(new TransferUtilityUploadRequest
{
BucketName = bucketName,
FilePath = filePath
});
}
}
additionally found it https://stackoverflow.com/questions/59986035/aws-s3-transferutility-uploadasync-succeeds-but-does-not-create-file
@peswallstreet I used both of your code snippets and they worked fine for me. The only way I can reproduce what you are saying is if my code that is calling the S3Manager.UploadFile
does not do an await
.
I dunno what the last two comments are related to, but the issue reported here is related to non-seekable input streams.
Where are we on this ?
We have a use-case where we do a httpclient.GetStreamAsync() and pass that stream to the uploadObjectAsync endpoint. This is the only useful endpoint, which actually doesnt store the entire content into memory. And we are dealing with the download of attachments that could be in many GBs as well.
While it works flawlessly for GCS (google cloud storage) using their UploadObjectAsync endpoint, this whole "cannot read content length" and limitations with non-seekable streams are giving us a hard time.
We really dont have an option to take everything into memory since we run in container environments with restricted resources and dont really have a provision to assign/attach data volumes.
Do we have any work-around for this scenario ?
@Nijasbijilyrawther you can try this: https://github.com/jasonterando/S3BufferedUpload
Or https://github.com/mlhpdx/s3-upload-stream which was suggested in the previous comment.
This was queued up as a Priority 2 item internally so hasn't gotten much attention, but this feature request has been implemented and will go out in our next manual release. Appreciate the patience and will post here when the feature is out.
@jecc1982 @Nijasbijilyrawther @NinoFloris @dave-yotta @jpestanota @AlainH @tapmantwo @Svisstack @tburnett80 @TommyN @peswallstreet @mrogunlana @p-selivanov @aron-truvian
This feature has been released in S3 Version 3.7.202. I'm going to leave this issue open for a couple more days just in case there are any issues.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Expected Behavior
Things should behave, I should be able to pass through a stream that has no length / zero length and at least get decent exceptions, preferably it should be handled for the user. Any other scenario requires us to buffer the full upload in a stream that can do it all (seeking, accurate report of length etc) before we can give the stream to the s3 sdk, not nice for using it in a webserver and feels like we have to babysit the aws sdk so it behaves...
Current Behavior
Request terminates with NotSupported exception as S3 tries to access the length somewhere without catching the exception. The stacktrace is posted under
Stacktrace for length NotSupported exception:
Steps to Reproduce (for bugs)
Create aspnet project, add a post endpoint for a simple upload transfer (just binary, no form data),
What also helps, as a non seekable stream is not always practical, is to add this line, this is a helper that spools the request body to disk once it reaches a threshold
Problems is that with that method the S3 client hangs and finally responds/times-out with the exception seen at the bottom
Stacktrace for length 0 case:
Your Environment
dotnet --info
:Product Information: Version: 1.0.4 Commit SHA-1 hash: af1e6684fd
Runtime Environment: OS Name: Mac OS X OS Version: 10.12 OS Platform: Darwin RID: osx.10.12-x64 Base Path: /usr/local/share/dotnet/sdk/1.0.4
System.NotSupportedException: Specified method is not supported. at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.FrameRequestStream.get_Length() at Amazon.S3.Transfer.TransferUtilityUploadRequest.get_ContentLength() at Amazon.S3.Transfer.TransferUtility.IsMultipartUpload(TransferUtilityUploadRequest request) at Amazon.S3.Transfer.TransferUtility.GetUploadCommand(TransferUtilityUploadRequest request, SemaphoreSlim asyncThrottler) at Amazon.S3.Transfer.TransferUtility.UploadAsync(Stream stream, String bucketName, String key, CancellationToken cancellationToken) ... our code ...
Amazon.S3.AmazonS3Exception: The provided 'x-amz-content-sha256' header does not match what was computed. ---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown. at Amazon.Runtime.HttpWebRequestMessage.d20.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler_mobile\HttpRequestMessageFactory.cs:line 404
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Amazon.Runtime.Internal.HttpHandler`1.d 9d3`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Amazon.S3.Internal.AmazonS3ResponseHandler.d 1d5`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\ErrorHandler.cs:line 104
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Amazon.Runtime.Internal.CallbackHandler.d 9d9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Amazon.Runtime.Internal.CallbackHandler.d 9d5`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\Handlers\ErrorCallbackHandler.cs:line 58
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Amazon.Runtime.Internal.MetricsHandler.d 1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Amazon.S3.Transfer.Internal.SimpleUploadCommand.d__10.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
... our code ...
1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler\HttpHandler.cs:line 175 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.RedirectHandler.<InvokeAsync>d__1
1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.Unmarshaller.1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.ErrorHandler.<InvokeAsync>d__5
1.MoveNext() --- End of inner exception stack trace --- at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleException(IExecutionContext executionContext, HttpErrorResponseException exception) in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\HttpErrorResponseExceptionHandler.cs:line 60 at Amazon.Runtime.Internal.ErrorHandler.ProcessException(IExecutionContext executionContext, Exception exception) in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\ErrorHandler.cs:line 212 at Amazon.Runtime.Internal.ErrorHandler.1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Amazon.Runtime.Internal.RetryHandler.<InvokeAsync>d__10
1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\RetryHandler\RetryHandler.cs:line 131 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.CallbackHandler.1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.S3.Internal.AmazonS3ExceptionHandler.<InvokeAsync>d__1
1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Amazon.Runtime.Internal.ErrorCallbackHandler.