aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.95k stars 1.05k forks source link

S3 crt client HeadObject call freezes. #3098

Open Manjunathagopi opened 2 weeks ago

Manjunathagopi commented 2 weeks ago

Describe the bug

We are running a service where we read data from s3 parallelly(multithreading), one day we saw all the threads are still running as head response from S3 freezes with no response ever returned(All new threads are also in struck state leading to accumulation of lot of threads). However this issue is fixed once we restarted the service.

Unfortunately, we have not been able to reproduce this issue since. While investigating, we found a similar issue in the AWS SDK for JavaScript here. Could you please confirm if this is indeed a similar issue? The suggested solution in that case was to configure an HTTP timeout.

We considered doing the same, but we discovered that the S3 CRT client does not honor timeout configurations, as mentioned in this issue. Could you provide information on when the AWS S3 CRT client will support timeout configurations? This support is crucial to ensure that we do not encounter S3 API call freezes in the future.

Expected Behavior

HeadObject call shouldnot freeze

Current Behavior

HeadObject call freezes.

Reproduction Steps

Unable to reproduce, but its better to configure HTTP timeout

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.269

Compiler and Version used

gcc (GCC) 4.8.5

Operating System and version

CentOS Linux and version 7

jmklix commented 2 weeks ago

Thanks for taking the time to look for older similar issues, but it's hard to tell if your situation is similar to the 10 year old js-v1 issue. I don't have any timeline for when timeout configurations might be supported by the CRT client. I would recommend that you 👍 the feature request, because that helps us when prioritizing new feature requests.

We can also look more into why you are seeing the HeadObject call freeze if you give us more info:

Manjunathagopi commented 1 week ago

Hello, @jmklix Unfortunately we are unable to reproduce this issue regularly so we plan of running some overnight testing with aws cpp SDK trace level logging enabled. Luckily same issue is seen again. You can find trace level logs from this link this(expires in 7days). NOTE: Issue started happening from the below log 2024-09-04T13:09:31+05:30 2024-09-04T07:39:31.129857333Z stdout F Sep 4 13:09:31.129 668_001 app: ,INFO, com.amagi.darti.s3_reader.buffered_reader, Opening s3://amagicloud-onecp-sigma8/Media/S3/668/+240211472001XA+.mxf Note above mentioned log is our custom log for headobject call this is just there for reference.

DmitriyMusatkin commented 1 week ago

What version of Curl are you building against and can you try building against newer curl. From logs it looks like, curl is trying to establish a new connection, sees existing one in the pool, but determines its dead, so it tries to obtain new ips from dns and gets stuck in the loop there. We dont have a lot of custom code around dns resolution in cpp sdk, so it might be due to some sort of bug in curl.

Note: for s3 crt client, crt is only used in the put/get apis, all the other apis still go through the regular curl based implementation (and timeout settings will apply to those as usual)

Manjunathagopi commented 1 week ago

@DmitriyMusatkin

  1. we are using curl version curl 7.29.0
  2. So configuring timeout settings will cancel curl based operations if it gets struck and eventually headobject call will return with error? and this applies even for crt client?
DmitriyMusatkin commented 1 week ago

curl 7.29.0 is over a decade old at this point. i would not be surprised if it has some issues with mva dns. yes, timeout should apply in this case. looks like its having issue with obtaining ips to establish connection, so connection timeout should stop it attempting. as i mentioned HeadObject in s3-crt cpp client does not actually use crt under the covers and goes through regular s3 client path.

github-actions[bot] commented 21 hours ago

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.