aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.07k stars 860 forks source link

GetObjectRequest hangs randomly. #152

Closed Stormsys closed 8 years ago

Stormsys commented 9 years ago

Randomly my application will hang infinity waiting for a response in the Amazon library for AmazonS3Client.GetOject see the stacktrace below, the object is not being reused.

image

here is a sample of the code run in that thread:

using (var amazonS3Client = CreateS3Client())
{
    var s3PartDownloadRequest = new GetObjectRequest
    {
        BucketName = download.Bucket,
        Key = download.ObjectKey,
        ByteRange = new ByteRange(position, end)
    };

    using (var getObjectResponse = amazonS3Client.GetObject(s3PartDownloadRequest)) //hangs on this line
    {    
        getObjectResponse.WriteResponseStreamToFile(dest);
    }
}
Stormsys commented 9 years ago

P.S. we are currently running AWS 2.3.11 but the issue was present with 2.3.9 also.

Stormsys commented 9 years ago

http://support.microsoft.com/kb/980817 perhaps is a likely culprate? Only thing is im pretty sure were running .net 4.5

Stormsys commented 9 years ago

Using the following code works, so i believe there's an issue with the sync version in the SDK:

using (var amazonS3Client = CreateS3Client())
{
    var s3PartDownloadRequest = new GetObjectRequest
    {
        BucketName = download.Bucket,
        Key = download.ObjectKey,
        ByteRange = new ByteRange(position, end)
    };

    using (var getObjectResponse = amazonS3Client.GetObjectAsync(s3PartDownloadRequest).GetResult())//async with my own sync method extension.
    {    
        getObjectResponse.WriteResponseStreamToFile(dest);
    }
}

//GetResult:
        public static T GetResult<T>(this Task<T> task)
        {
            task.ConfigureAwait(false);
            task.Wait();

            return task.Result;
        }
Stormsys commented 9 years ago

Bug also observed with "UploadPart" calls(non async version). i belive there is a bug in your "Invoke" flow.

gokarnm commented 9 years ago

Thanks @Stormsys for reporting this issue and providing details! I'll look into this. I have a few questions about your application.

  1. Is your application a Windows or an ASP.NET application?
  2. Is the code which calls the S3 API multithreaded? If yes, how many concurrent threads are used?
  3. How frequently do you see this issue? Have you been able to replicate this issue on another machine?
  4. Have you noticed this issue with any other AWS SDK APIs apart from S3?

Because the hang is intermittent, it would be helpful to replicate this issue under different conditions.

  1. Can you switch to an http endpoint instead of https and check if you get the same issue? The following snippet shows how you can switch to http.

var s3config = new AmazonS3Config() { UseHttp = true }; var client = new AmazonS3Client(s3config);

Stormsys commented 9 years ago

Hi,

Just to add, with upload part even with my sync wrapper the bug persists, even Async().wait() never detects a completion state sometimes, interestingly we have not seen this in GetObject since we switched over to the async version.

To awsner your questions:

  1. Its a x64 windows application, running as a windows service.
  2. the code is mutlithreaded and peaks at about 100~ threads however at the point where the bug is visible and hooking the deubgger in only 3 threads are active, and they are not interdependent as such (i'm certain its not a classic deadlock) it might be worth mentioning that we have 32 hardware threads on the server running the code.
  3. I've seen this issue at least 1-2 times per day, perhaps every 500-700 requests or so.
  4. I'm not currently using any other services as it stands.

and i can certainly try http, but realistically this probably would not be an satisfactory workaround.

gokarnm commented 9 years ago

Thanks @Stormsys , yes I understand, I suggested trying out http to isolate the behavior. Have you seen this behavior on any other server machines?

Stormsys commented 9 years ago

@gokarnm i will try the Http setting tomorrow, we do have another server we can test on but have not yet done so, can also do this if it helps.

saguiitay commented 9 years ago

Has anyone been able to resolve this issue? I'm facing it too...

theofanis commented 9 years ago

I also experience this sometimes, UploadPartAsync never returns, and actually, it doesn't even upload, since the StreamTransferProgress handler doesn't get anything.

I also provide a CancellationToken, and when the problem occurs, it even ignores the cancellation, so there's actually no way to get this task finished.

kobi commented 9 years ago

In case anyone still has this issue, I've found two changes that removed the problem:

Two more comments:

randall-peakey-com commented 8 years ago

I am seeing something very similar, although we are using the PutObject method. We can set the DefaultConnectionLimit very high....but as soon as that number of connections has been reached the application hangs.

Running "netstat -a -n | find /c "54.231." indicates the same number of connections If we run a simple "netstat -a -n" shows that the connections in the "CLOSE_WAIT" state.

These connections remain in this state until the application is terminated (this has caused problems when using the library in a website......it must be restarted to clear).

My guess is that the underlying httpwebrequest connection is not being closed properly.

We have tried (without success) setting... System.Net.ServicePointManager.DefaultConnectionLimit = 1000; System.Net.ServicePointManager.SetTcpKeepAlive(false, 1, 1); System.Net.ServicePointManager.MaxServicePointIdleTime = 10 * 1000;

djluck commented 8 years ago

I think the key might be to dispose of the GetObjectResponse as quickly as possible. In my program, I'm concurrently downloading the contents of an entire bucket (with 35 concurrent worker tasks). I noticed that I started seeing object requests hang indefinitely if I didn't immediately read the contents of GetObjectResponse.ResponseStream into memory and dispose of the stream. Fiddling with the DefaultConnectionLimit didn't seem to offer any improvement, only the quick disposal of the stream made any difference for me.

MikesGlitch commented 8 years ago

I'm using the Quartz Scheduler for .Net and I was receiving the problem intermittently during a Job Execution. I have implemented Kobi's and Stormsys's posts and they both improved matters - but didn't fully fix the problem.

The only thing that HAS managed to fix the problem has been to make a new instance of my S3Client in my Job Execution class whenever it executes rather than resolving it using dependency injection - I think it has something to do with thread safety. After implementing this, along with Kobi's and Stormsys's posts I haven't experienced the problem any more.

thoean commented 8 years ago

I have a related problem, using the asp.net core SDK version 3.2.3-beta. My problem is related to executing GetObjectAsync calls in parallel, but reading through the thread, it might be highly related. I've posted the problem at http://stackoverflow.com/questions/37471477/download-files-from-s3-in-parallel-aws-net-sdk before I found this thread.

Does the AmazonS3Client have any synchronization or shared state?

lewislabs commented 8 years ago

I've been experiencing this, and on investigating the issue I think the line to blame is https://github.com/aws/aws-sdk-net/blob/aws-sdk-net-v2/AWSSDK_DotNet35/Amazon.Runtime/Pipeline/HttpHandler/HttpHandler.cs#L104. If the GetResponse method throws a WebException internally, then the response stream will never be closed. That's consistent with seeing connections hanging in the close_wait state.

sstevenkang commented 8 years ago

The new 3.3.1 version of Core contains the PR https://github.com/aws/aws-sdk-net/pull/449 which addresses this problem. Please let us know if the problem persists. Thanks!

sstevenkang commented 8 years ago

Closing due to inactivity. If any of you guys encounter this issue again, feel free to reopen it. Thanks!

rsrini83 commented 7 years ago

I'm not sure how to reopen this issue. So adding my problem here.

We have a server side application in .Net developed using Nancyfx framework. Running as selfhost. This application receives multipart request with multiple files(around 100). All these files are supposed to be upload to S3 bucket. Using Parallels to upload files to S3 bucket. Right now creating s3 object for every task. This is causing too many HTTP connections and after a while system is become slow or s3 latency increases. We have optimized at TCP level, to reduce the TcpWaitTimeDelay to 30 seconds.

Can anyone help how to resolve this issue ? How can we reduce AmazonS3Client HTTP connection pool ?

Using Windows 2012 AWS SDK version : 3.3.7

Let me know if any further information require.

Thanks in advance.

PavelSafronov commented 7 years ago

Separate issue was opened for this question, no need to re-open - https://github.com/aws/aws-sdk-net/issues/546

onyxmaster commented 7 years ago

Hangs on AWSSDK.Core 3.3.12 + AWSSDK.S3 3.3.5.12.

mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext)
System.dll!System.Net.LazyAsyncResult.WaitForCompletion(bool snap)
System.dll!System.Net.HttpWebRequest.GetResponse()
AWSSDK.Core.dll!Amazon.Runtime.Internal.HttpRequest.GetResponse()
AWSSDK.Core.dll!Amazon.Runtime.Internal.HttpHandler<System.IO.Stream>.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.RedirectHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.Unmarshaller.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.S3.dll!Amazon.S3.Internal.AmazonS3ResponseHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.ErrorHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.CallbackHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.RetryHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.CallbackHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.CallbackHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.S3.dll!Amazon.S3.Internal.AmazonS3ExceptionHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.MetricsHandler.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.Internal.RuntimePipeline.InvokeSync(Amazon.Runtime.IExecutionContext executionContext)
AWSSDK.Core.dll!Amazon.Runtime.AmazonServiceClient.Invoke<Amazon.S3.Model.PutObjectRequest, Amazon.S3.Model.PutObjectResponse>(Amazon.S3.Model.PutObjectRequest request, Amazon.Runtime.Internal.Transform.IMarshaller<Amazon.Runtime.Internal.IRequest, Amazon.Runtime.AmazonWebServiceRequest> marshaller, Amazon.Runtime.Internal.Transform.ResponseUnmarshaller unmarshaller)
AWSSDK.S3.dll!Amazon.S3.AmazonS3Client.PutObject(Amazon.S3.Model.PutObjectRequest request)
craigbrett17 commented 7 years ago

I'm still noticing this behaviour.

AWSSDK.Core: version=3.3.13.1 AWSSDK.S3 version=3.3.5.10

Increasing the ServicePointManager.DefaultConnectionLimit to a nice high number temporarily alleviates the problem. But I'm not sure what to do otherwise. I don't think the quick disposal thing is our problem as we're not really dealing with concurrent requests at this time.

randall-peakey-com commented 7 years ago

From our experience concurreny doesn't really matter, if you don't dispose, the connection will hang around until the application terminates. We are simply disposing of the object as soon as possible and all the problems have disappeared.

craigbrett17 commented 7 years ago

@randall-peakey-com: Interesting. Okay, this will be my next attempt at a fix. Right now we return the whole GetObjectResponse and handle it elsewhere in the code, so I might just have to rewrite it to use only the stream and copy it out to a MemoryStream and return that. Thanks for the info.

craigbrett17 commented 7 years ago

Alternatively, just rewrite everything that was using the GetObjectResponse to just be inside a using statement and it's actually done the trick, even without the DefaultConnectionLimit change. Thanks @randall-peakey-com!

ghost commented 7 years ago

We have been hitting this issue sporadically in our application for over a year - always while hitting the S3 api concurrently on multiple threads. We just hit again tonight and I got a stack trace of the hung thread. See below. We have hit this on different sdk calls: GetObject, GetObjectMetadata, and PutObject. Our situation was greatly improved a while ago by auditing every call to the sdk to make sure we weren't leaking requests or s3 clients.

Here's the stack trace I captured tonight. At the time we detected this hang, we had 16 threads concurrently hitting s3. This on on version 3.3.0 of the sdk on .Net Framework 4.7, Windows Server 2016, running on a machine inside EC2. Does anything jump out here? I haven't tried moving all our S3 access to http (as ooposed to https) as I see suggested above. Is that still a recommended workaround?


StackTrace: at System.Net.UnsafeNclNativeMethods.OSSOCK.recv(IntPtr socketHandle, Byte* pinnedBuffer, Int32 len, SocketFlags socketFlags)
at System.Net.UnsafeNclNativeMethods.OSSOCK.recv(IntPtr socketHandle, Byte* pinnedBuffer, Int32 len, SocketFlags socketFlags)
at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, SocketError& errorCode)
at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.FixedSizeReader.ReadPacket(Byte[] buffer, Int32 offset, Int32 count)
at System.Net.Security._SslStream.StartFrameHeader(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.StartReading(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.TlsStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.PooledStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.Connection.SyncRead(HttpWebRequest request, Boolean userRetrievedStream, Boolean probeRead)
at System.Net.ConnectStream.ProcessWriteCallDone(ConnectionReturnResult returnResult)
at System.Net.ConnectStream.CallDone(ConnectionReturnResult returnResult)
at System.Net.ConnectStream.CloseInternal(Boolean internalCall, Boolean aborting)
at System.Net.ConnectStream.System.Net.ICloseEx.CloseEx(CloseExState closeState)
at System.Net.HttpWebRequest.EndWriteHeaders_Part2()
at System.Net.HttpWebRequest.EndWriteHeaders(Boolean async)
at System.Net.HttpWebRequest.WriteHeadersCallback(WebExceptionStatus errorStatus, ConnectStream stream, Boolean async)
at System.Net.ConnectStream.WriteHeaders(Boolean async)
at System.Net.HttpWebRequest.EndSubmitRequest()
at System.Net.Connection.SubmitRequest(HttpWebRequest request, Boolean forcedsubmit)
at System.Net.ServicePoint.SubmitRequest(HttpWebRequest request, String connName)
at System.Net.HttpWebRequest.SubmitRequest(ServicePoint servicePoint)
at System.Net.HttpWebRequest.GetResponse()
at Amazon.Runtime.Internal.HttpRequest.GetResponse()
at Amazon.Runtime.Internal.HttpHandler1.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RedirectHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.Unmarshaller.InvokeSync(IExecutionContext executionContext)
at Amazon.S3.Internal.AmazonS3ResponseHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.S3.Internal.AmazonS3ExceptionHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.MetricsHandler.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.Internal.RuntimePipeline.InvokeSync(IExecutionContext executionContext)
at Amazon.Runtime.AmazonServiceClient.Invoke[TRequest,TResponse](TRequest request, IMarshaller`2 marshaller, ResponseUnmarshaller unmarshaller)
at Amazon.S3.AmazonS3Client.GetObjectMetadata(GetObjectMetadataRequest request)
Plasma commented 6 years ago

This may be related to https://github.com/dotnet/corefx/issues/21796 (.NET FX deadlock on sockets cleanup and due to ServicePoint.set_ConnectionLimit internal locks)

I've started getting deadlocks due to many S3 SDK GetObject/Metadata/Exists calls and eventually something deadlocks exactly like the above issue.

Plasma commented 6 years ago

I've posted a workaround that is currently working for us, I believe: https://github.com/dotnet/corefx/issues/21796#issuecomment-381515493

cullenjohnson commented 5 years ago

@djluck commented on Mar 21, 2016:

I think the key might be to dispose of the GetObjectResponse as quickly as possible. In my program, I'm concurrently downloading the contents of an entire bucket (with 35 concurrent worker tasks). I noticed that I started seeing object requests hang indefinitely if I didn't immediately read the contents of GetObjectResponse.ResponseStream into memory and dispose of the stream.

Fiddling with the DefaultConnectionLimit didn't seem to offer any improvement, only the quick disposal of the stream made any difference for me.

This was exactly my problem. We were mistakenly calling Dispose on the resultant Task<GetObjectResponse> (returned from GetObjectAsync) instead of the .Result of the task.

Task<GetObjectResponse> s3FileResponse = S3Client.GetObjectAsync(S3BucketName, s3FilePath);

// ...

try
{
    // ...
}
finally
{
    s3FileResponse.Result?.Dispose();
    // Changed from the following incorrect line:
    //s3FileResponse?.Dispose();
}
ghost commented 5 years ago

It's April and 2019 and, years later, we are STILL experiencing random and sporadic hangs when accessing S3 via the .Net AWSSDK with a high degree of parallelism.

We've tried disabling SSL. We've done multiple audits for leaked IDisposables. We've messed with ServicePointManager.DefaultConnectionLimit. The hangs still persist.

Are others still experiencing this issue as well?

Plasma commented 5 years ago

Hey @tomlor,

We just worked this issue by first:

https://github.com/dotnet/corefx/issues/21796#issuecomment-381515493

And for downloading the blob data, we instead just generate a signed URL via the s3 client, and use HttpClient to download it instead of the s3 SDK. No issues now for 12 months.

dyardyGIT commented 4 years ago

I'm not sure how to reopen this issue. So adding my problem here.

We have a server side application in .Net developed using Nancyfx framework. Running as selfhost. This application receives multipart request with multiple files(around 100). All these files are supposed to be upload to S3 bucket. Using Parallels to upload files to S3 bucket. Right now creating s3 object for every task. This is causing too many HTTP connections and after a while system is become slow or s3 latency increases. We have optimized at TCP level, to reduce the TcpWaitTimeDelay to 30 seconds.

Can anyone help how to resolve this issue ? How can we reduce AmazonS3Client HTTP connection pool ?

Using Windows 2012 AWS SDK version : 3.3.7

Let me know if any further information require.

Thanks in advance.

I have been fighting this issue for 6 months and still not good resolution. Help

clevrdavid commented 4 years ago

I think there is still an issue here.

I'm strangely getting this issue in a ASP .NET Core 3.1 Web API project, but not getting it in a ASP .NET Core 3.1 MVC project. The code calling the GetObjectAsync task is exactly the same in both projects and using the same credentials and bucket.

Been tearing my hair out all day, when this should be simple.

Plasma commented 4 years ago

@dyardyGIT @clevrdavid Depending on the stack trace you get when things get locked up, https://github.com/dotnet/runtime/issues/22592#issuecomment-381515493 may fix this for you like it did for us.

As for too many tasks, perhaps wrap your upload code path in a Polly Bulkhead code block which will help throttle the parallelism: https://github.com/App-vNext/Polly/wiki/Bulkhead

dyardyGIT commented 4 years ago

I am shocked that there is not legitimate answer to this issue. I understand the 'randomness' makes it difficult but we have tried many things on both .net framework and .net core. This issue really was surfaced to a much greater extent when we made our move from Windows 2008 to Window 2012. On 2012 we have any to address http connection/socket issues.

The comment about setting ServicePointManager.DefaultConnectionLimit=50 we have tried with no success.

randall-peakey-com commented 4 years ago

@dyardyGIT Disposing has certainly eliminated this problem for many. https://github.com/aws/aws-sdk-net/issues/152#issuecomment-297933416 Have you given this a try?

dyardyGIT commented 4 years ago

oading the blob data, we instead just generate a signed URL via the s3 client, and use HttpClient to download it instead of the s3 SDK. No issues now for 12 months.

Can you share how you generate an url using the sdk? thanks!

dyardyGIT commented 4 years ago

Yes, it looks like this now, and still having the issue. using (GetObjectResponse response = _amazonClient.GetObject(request)) { using (Stream responseStream = response.ResponseStream) { amazonFile = new AmazonFile(); amazonFile.FileBytes = ReadStream(response.ResponseStream); amazonFile.Size = amazonFile.FileBytes.LongLength; } }

Plasma commented 4 years ago

Looks like our problem was mostly upload related, where it would hang, so we made our upload code just get the signed URL to upload to and PUT the data via regular HTTP calls:

        /// <summary>
        /// Upload to the specified key the provided stream of data
        /// </summary>
        async Task UploadUsingWebRequestAsync (string key, Stream stream) {
            var client = CreateClient ();

            // Calculate URL to upload to
            var signedRequest = new GetPreSignedUrlRequest {
                BucketName = BucketName,
                    Key = key,
                    Verb = HttpVerb.PUT,
                    Expires = DateTime.UtcNow.AddHours (1)
            };

            // Generate Url
            var uploadUrl = client.GetPreSignedURL (signedRequest);

            // Perform Upload
            // Create content
            var streamContent = new StreamContent (stream);

            // Create Retry Policy
            var retryPolicy = Policy<HttpResponseMessage>
                .Handle<HttpRequestException> ()
                .WaitAndRetryAsync (3, x => TimeSpan.FromSeconds(x));

            // Put File
            using (var response = await retryPolicy.ExecuteAsync (() => SharedClient.PutAsync(uploadUrl, streamContent))) {
                // Verify
                if (!response.IsSuccessStatusCode)
                    throw new ArgumentException ($"Upload of blob failed ({key}): {await response.Content.ReadAsStringAsync()}");
            }
        }

As for downloading, we do less of that, but you can imagine the flow is similar (client.GetPreSignedUrl), then just do a HTTP GET on that Url.

randall-peakey-com commented 4 years ago

Yes, it looks like this now, and still having the issue. using (GetObjectResponse response = _amazonClient.GetObject(request)) { using (Stream responseStream = response.ResponseStream) { amazonFile = new AmazonFile(); amazonFile.FileBytes = ReadStream(response.ResponseStream); amazonFile.Size = amazonFile.FileBytes.LongLength; } }

I see you are disposing of the Stream and the GetObjectResponse, but not the AmazonS3Client.

pavisalavisa commented 4 years ago

I'm a bit late to the party but I'd like to give my two cents on this issue.

We're running .NET 4.7 on windows server 2016. We have multiple worker instances (some of which are running Quartz) and multiple instances running ASP.NET Web API. API instances never displayed any problems regarding S3 requests hanging, however, worker instances did.

At first, I thought that we might have some kind of deadlock because logging in place wasn't verbose enough to pinpoint the exact location where the application stopped. It was clear that these hangs were happening only when the system was under a considerable load.

Dumping the stack trace pointed me in the wrong direction. It showed that my service blocked on the following snippet executing Exists:

public bool ObjectExists(string bucketName, string objectKey)
        {
            var s3FileInfo = new S3FileInfo(_s3Client, bucketName, objectKey);
            return s3FileInfo.Exists;
        }

I've tried everything I could find related to this issue including updating to .NET 4.8, using single instance S3 client, using different S3 clients for every unit of work, changing the existing implementation to use different SDK methods, etc.

Tuning the ServicePointManager.DefaultConnectionLimit = N certainly did have an effect on how soon the service ground to a halt. Using netstat I noticed that there were N connections to S3 with CLOSE_WAIT status. According to TCP specification, this state indicates that the server (S3) received and acknowledged signal for closing the connection but the client (your application) has not yet closed the socket.

This information steered me away from the problem source and into the esoteric search for the bug in the framework (hence the update to a newer version of the framework). While there might be cases where the framework bug caused your implementation to misbehave, that wasn't my case.

Enter the following comment:

@dyardyGIT Disposing has certainly eliminated this problem for many. #152 (comment) Have you given this a try?

I inspected the facade implementation that we used to wrap S3 related actions and discovered that one of the s3Client.GetObjectAsync(getObjectRequest) objects was not disposed of. That method is called in the normal service flow but it's kind of buried deeper in the service layer. This caused the number of open connections to grow and eventually lead service to a grinding halt. The interesting part is that because of the way the service was implemented, it would always stop on the same method (ObjectExists shown above) where there was no need for disposal per se.

Disposing of the object properly meant that no connections were left hanging and that the service could handle bigger loads without stopping.

I still don't understand why the SDK lets this happen in the first place. I would've been happier with an exception telling me that the number of connections has been exceeded and that new connections cannot be established.

brinkdinges commented 4 years ago

I think I'm running into this issue as well. I'm trying to upload a 30kB text file from a .NET 4.7.2 desktop app with AWSSDK.S3 3.3.111.33. I run as a plugin in another app, which might be single-threaded.

This line from the official docs never uploads the file and hangs until I force close the application. This happens every single time. I also see lingering CLOSE_WAIT items in netstat. PutObjectResponse response = await client.PutObjectAsync(putRequest);

I can only upload the file and close the request when I wrap the PutObjectAsync in a using statement, even though the only disposable object is the Task that it creates.

public static void UploadFile()
{
   // pseudocode
   using (credentials)
   using (client)
   create putRequest
   PutObjectAsync(client, putRequest).Wait();
}

public static async Task PutObjectAsync(AmazonS3Client client, PutObjectRequest putRequest)
{
  using (var task = client.PutObjectAsync(putRequest))
  {
    var success = task.Result.HttpStatusCode == HttpStatusCode.OK;
    if (!success) throw new CannotUploadFile();
  }
}

But this is no longer async since there is no await in the PutObjectAsync method. So the UI becomes unresponsive. Any ideas on how to work around this?

A second, related issue is that this behavior gets even worse when there is no internet connection. My working sample above times out at about 100 seconds. That's a long time to wait. I have found no setting that had any effect on this.

Plasma commented 4 years ago

@brinkdinges my workaround I'd suggest is to bypass the SDK for uploading, see my comment here https://github.com/aws/aws-sdk-net/issues/152#issuecomment-587210064

The workaround is to have the SDK generate the HTTPS Upload URL (that includes the signing key), then use HttpClient to PUT the data directly (pretty much what the SDK does, anyway).

Plasma commented 4 years ago

@brinkdinges your using statement is also wrong, you should await the task to ensure you are async:

public static async Task PutObjectAsync(AmazonS3Client client, PutObjectRequest putRequest)
{
  // Do not dispose of task, dispose of the async result (and await the task instead of accessing .Result property directly)
  // I am not at a computer, perhaps PutObjectAsync does not implement IDisposable, in which cause using not required.
  using (var putResponse = await client.PutObjectAsync(putRequest))
  {
    var success = putResponse.HttpStatusCode == HttpStatusCode.OK;
    if (!success) throw new CannotUploadFile();
  }
}
brinkdinges commented 4 years ago

@Plasma Thank you. I had already tried to use your implementation, but I couldn't find what the SharedClient and Policy where. I found the policies in the SDK, but none that used generics. Could you clarify those two?

PutObjectAsync indeed doesn't implement IDisposable. But without the using, the request never ends or gives a result. That is the reason I'm trying all these workarounds. I just tried GetObjectAsync and the same happens, it also never returns a result.

Right now I have wrapped the whole method that creates credentials, the client and the request in a single task and I await that. This works for now.

Plasma commented 4 years ago

Ah, SharedClient is just an instance of HttpClient, and Policy is Polly.Net Retry Policy -- this part is kinda optional and you can skip it.

But, your original question above, you had pasted code with a known anti-pattern of using async, where a deadlock can definitely occur, because you are not await'ing a task, but instead trying to access the .Result property directly. Ignoring my workaround, what if you change that whole method to this?

public static async Task PutObjectAsync(AmazonS3Client client, PutObjectRequest putRequest)
{
  var result = await client.PutObjectAsync(putRequest);
  var success = result.HttpStatusCode == HttpStatusCode.OK;
  if (!success) throw new CannotUploadFile();
}
brinkdinges commented 4 years ago

I am very sure that's what I started with. It didn't work then, it does now 😄 Thanks for pushing me to do it the correct way.