aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.12k stars 579 forks source link

Multipart upload requests suddenly get stuck without throwing any error #5561

Closed ishucr7 closed 11 months ago

ishucr7 commented 11 months ago

Checkboxes for prior research

Describe the bug

Overview

What's our setup?

What's the problem?

What's weird about it?

  1. It's not deterministically reproducable, it'll work throughout the day and all of a sudden it starts breaking
  2. AWS permissions for the pods is not the problem: We tried performing the multipart in the same pod via different routes (boto3) and it works
  3. Same pod has another api that performs single file upload, and it uses the same S3 Client and that works too

SDK version number

@aws-sdk/client-s3@3.374

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

node -20.5.0

Reproduction Steps

That's what's weird about it, it's not deterministically reproducable. It just starts happening all of a sudden

Observed Behavior

No stack traces to share, the code just doesn't continue execution

Expected Behavior

For the sdk to not abruptly stop executing the call or atleast give an error about it

Possible Solution

No response

Additional Information/Context

No response

ishucr7 commented 11 months ago

Okay, found the issue. We create one single client at the start of the server and use that throughout the lifecycle of the server. All API calls use the same client.

And it's not actually multipart API calls but rather that client is hanging due to the following reasons

  1. We have multiple parallel GetObject requests, due to this the number of sockets left in the pool for the client for other API calls becomes less. The default socket pool size is small
  2. There are high chances of socket leaks when you pipe the response from the GetObject commands into another stream, this is something that we were doing.
  3. The default configuration of the sdk has no default value for socket timeout, so in case there's something wrong with 2, we end up loosing that socket from the pool.

All these points tend to reduction in the number of sockets available in the pool for use by other S3 calls, thus eventually making the client hang.

How can we configure it, is shared here in another github issue nicely explaining the whole story

github-actions[bot] commented 10 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.