aws / aws-sdk-java

The official AWS SDK for Java 1.x. The AWS SDK for Java 2.x is available here: https://github.com/aws/aws-sdk-java-v2/
https://aws.amazon.com/sdkforjava
Apache License 2.0
4.12k stars 2.83k forks source link

400 Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. #1101

Closed yuriti89 closed 7 years ago

yuriti89 commented 7 years ago

Hi!

I'm using aws-sdk v1.11.5 and since 28 of March has rare error responses (status code 400) when trying to execute putObject: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

Before 28 of March there are no such errors. maxRetryCount is set to 20 in ClientConfiguration.

Why such errors come?

millems commented 7 years ago

Hi, thanks for contacting us!

  1. You're saying this error did not start happening after an upgrade to the SDK, correct?
  2. Is there a stack trace associated with this error?
millems commented 7 years ago

One additional question:

  1. What data source (file, network connection, in-memory byte array, etc.) are you using as the input stream in your put object request?
yuriti89 commented 7 years ago
  1. Yes, I didn't update SDK.
  2. No for this moment. I'll extend logging in next two days and get stack trace as soon as catch exception.
  3. I use file, this constructor for request: public PutObjectRequest(String bucketName, String key, File file) { super(bucketName, key, file); }
millems commented 7 years ago

Thanks for extending the logging! If possible, could you also enable debug-level logging so that we can see the network events as well?

This message can occur if your input stream is too large (or is read too slowly). If you can see the timing in your current logs, how much clock time has passed between when you begin to send the request, and when the error is received?

yuriti89 commented 7 years ago

All 20 retries take ~40-50 seconds. And file isn't large. For example, about 100 KB take 42 seconds for 20 retries.

yuriti89 commented 7 years ago

Here is stack trace (with replaced request id): com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: XXX), S3 Extended Request ID: XXXXX at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1305) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:852) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:630) at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:405) at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:367) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:318) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3787) at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1399)

millems commented 7 years ago

Are you using the default socket timeout in your Client Configuration when you create the S3 client?

yuriti89 commented 7 years ago

Yes, 50000 ms for socket timeout and 10000 ms for connection timeout.

millems commented 7 years ago

And to clarify, all 20 retries occur in the span of 40-50 seconds total? An average of ~2-3 seconds per request?

If so, can you please provide an example request ID? I'd like to follow-up with S3 directly to see if they have information on why the request is timing out.

yuriti89 commented 7 years ago

Request ID: D69FF5D8E0FCFE36 Extended Request ID: ha1jka3RZmHrlJCwydpRHCr1QldR36RBrM3X2j823v/QMlK/ODbTa49/ykPmNLhtObVDaLu7vM8

Error with these IDs was at 2017.04.07 18:11:10 (GMT). The first request started at about 18:10:27, and all retries were failed at 18:11:10.

fernomac commented 7 years ago

Were there actually any retries done? I wouldn't expect the SDK to retry on a 400.

yuriti89 commented 7 years ago

Hmm, i didn't expect it.

millems commented 7 years ago

You might be able to determine the issue by trying performing a packet capture or enabling the debug-level logging.

Response from S3:

The error you have received indicates that S3 never received your request or it's related data, this eventually resulted in S3 timing out the connection due to idleness.

This can occur in instances where network connectivity is impacted by one or more following:

There may be other network or system related issues which could additionally also cause similar behavior.

yuriti89 commented 7 years ago

I'll enable debug-level logging for com.amazonaws.

millems commented 7 years ago

Closing for now, please feel free to reopen if you feel there's an issue with the SDK.

igracia commented 7 years ago

@millems recently run into this. This is something that happens across other SDK flavors. In this thread, one user reports having the issue fixed by reverting to signature version 2. I don't feel that's a viable solution, as it won't be supported across all regions. I'm sorry but I can't provide a reproducer. Since in our case it was also a small file (~120KB) as in the case of @yuriti89, could it be something related with the size of the files?

millems commented 7 years ago

It's super interesting that sig v2 is reported to fix it. It doesn't sound like a Java SDK issue if it's happening across SDKs. I understand it's hard to reproduce, but if we can get good wire logs for the issue, that will give us something to work with.

@igracia How frequently does it occur for you?

igracia commented 7 years ago

@millems Just once. No matter how much load I put in the server, or the size of the files, I haven't been able to reproduce it, sorry. Would the IdleConnectionReaper.java help to prevent this from happening in the future?

millems commented 7 years ago

@igracia Thanks for trying to reproduce it. That's been our experience as well - it just doesn't happen when you're observing it.

The IdleConnectionReaper will only clean up connections that are not "checked out" from the connection pool for use after a period of time. In this case, we have a connection "checked out" (otherwise we couldn't get a 400), but something seems to be preventing the SDK from writing the entire message payload. The holy grail would be a tcpdump running when this error happens, but even getting debug-level logs has been difficult.

igracia commented 7 years ago

@millems we'll set the log to debug-level for the aws client in non-prod environments. If it happens again, we'll have some better info to work on! Thing is, in almost a year of use, it has only reproduced once.

millems commented 7 years ago

@igracia Thank you very much! Some retries are expected to happen sometimes. We definitely try not to have any intermittent issues, but it's really hard to guarantee that nothing will ever go wrong in processing a request. Is there any impact in your use-case when something like this happens, other than log scan alarms and the additional latency of the retry? If there's something we can do better when this does happen, I want to make sure we don't miss it.

millems commented 7 years ago

Closing until logs are available.

igracia commented 7 years ago

@millems to answer this question

Is there any impact in your use-case when something like this happens, other than log scan alarms and the additional latency of the retry? If there's something we can do better when this does happen, I want to make sure we don't miss it.

There's no other impact in the service other than that. However, we haven't confirmed this yet as the issue has still not repeated itself. Not sure if we'll see the error message now that we have correctly configured the retry mechanism, though.

igracia commented 6 years ago

@millems This has happened again a couple of times. Here are some logs to see if this helps you

[2018-06-11T22:02:02.572-07:00] [s3-thr-17] DEBUG com.amazonaws.request  - Sending Request: PUT https://my-bucket.s3.amazonaws.com /6212tagValue/objectKey.mkv Headers: (User-Agent: aws-sdk-java/1.11.337 Linux/4.9.76-3.78.amzn1.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.161-b12 java/1.8.0_161, x-amz-tagging: OneTag=tagValue&MediaType=video&ContainerFormat=mkv, amz-sdk-invocation-id: xxxxxxxxxxxx, Content-Length: 1928042, Content-MD5: xxxxxxxxxx, Content-Type: video/x-matroska, )
[2018-06-11T22:02:02.572-07:00] [s3-thr-17] DEBUG com.amazonaws.auth.AWS4Signer  - AWS4 Canonical Request: '"PUT
/6212tagValue/objectKey.mkv

amz-sdk-invocation-id:xxxxxxxxxx
amz-sdk-retry:0/0/500
content-length:1928042
content-md5:XXXXXXXXXXXXX
content-type:video/x-matroska
host:my-bucket.s3.amazonaws.com
user-agent:aws-sdk-java/1.11.337 Linux/4.9.76-3.78.amzn1.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.161-b12 java/1.8.0_161
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20180612T050202Z
x-amz-security-token:XXXXXXXXXXXXXXXXX
x-amz-tagging:OneTag=tagValue&MediaType=video&ContainerFormat=mkv

amz-sdk-invocation-id;amz-sdk-retry;content-length;content-md5;content-type;host;user-agent;x-amz-content-sha256;x-amz-date;x-amz-security-token;x-amz-tagging
UNSIGNED-PAYLOAD"
[2018-06-11T22:02:02.572-07:00] [s3-thr-17] DEBUG com.amazonaws.auth.AWS4Signer  - AWS4 String to Sign: '"AWS4-HMAC-SHA256
20180612T050202Z
20180612/us-east-1/s3/aws4_request
XXXXXXXXXXXX"
[2018-06-11T22:02:02.572-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - connecting to my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443
[2018-06-11T22:02:02.572-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Connecting socket to my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443 with timeout 1000
[2018-06-11T22:02:02.722-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Enabled protocols: [TLSv1, TLSv1.1, TLSv1.2]
[2018-06-11T22:02:02.722-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Enabled cipher suites:[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_DSS_WITH_AES_256_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDH_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_DSS_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_DSS_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_RSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA, TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
[2018-06-11T22:02:02.722-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - socket.getSupportedProtocols(): [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2], socket.getEnabledProtocols(): [TLSv1, TLSv1.1, TLSv1.2]
[2018-06-11T22:02:02.722-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - TLS protocol enabled for SSL handshake: [TLSv1.2, TLSv1.1, TLSv1]
[2018-06-11T22:02:02.722-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Starting handshake
[2018-06-11T22:02:02.744-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Enabled protocols: [TLSv1, TLSv1.1, TLSv1.2]
[2018-06-11T22:02:02.745-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Enabled cipher suites:[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_DSS_WITH_AES_256_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDH_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_DSS_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_DSS_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_RSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA, TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
[2018-06-11T22:02:02.745-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - socket.getSupportedProtocols(): [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2], socket.getEnabledProtocols(): [TLSv1, TLSv1.1, TLSv1.2]
[2018-06-11T22:02:02.745-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - TLS protocol enabled for SSL handshake: [TLSv1.2, TLSv1.1, TLSv1]
[2018-06-11T22:02:02.745-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Starting handshake
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Secure session established
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  negotiated protocol: TLSv1.2
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  peer principal: CN=*.s3.amazonaws.com, O=Amazon.com Inc., L=Seattle, ST=Washington, C=US
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  peer alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  issuer principal: CN=DigiCert Baltimore CA-2 G2, OU=www.digicert.com, O=DigiCert Inc, C=US
[2018-06-11T22:02:03.035-07:00] [s3-thr-16] DEBUG com.amazonaws.internal.SdkSSLSocket  - created: my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  - Secure session established
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  negotiated protocol: TLSv1.2
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  peer principal: CN=*.s3.amazonaws.com, O=Amazon.com Inc., L=Seattle, ST=Washington, C=US
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  peer alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG c.a.h.conn.ssl.SdkTLSSocketFactory  -  issuer principal: CN=DigiCert Baltimore CA-2 G2, OU=www.digicert.com, O=DigiCert Inc, C=US
[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG com.amazonaws.internal.SdkSSLSocket  - created: my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443
[2018-06-11T22:03:23.054-07:00] [java-sdk-http-connection-reaper] DEBUG com.amazonaws.internal.SdkSSLSocket  - shutting down output of my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443
[2018-06-11T22:03:23.054-07:00] [java-sdk-http-connection-reaper] DEBUG com.amazonaws.internal.SdkSSLSocket  - closing my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443

I think the relevant part is when it closes the connection, right at the end

[2018-06-11T22:02:03.105-07:00] [s3-thr-17] DEBUG com.amazonaws.internal.SdkSSLSocket  - created: my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443
[2018-06-11T22:03:23.054-07:00] [java-sdk-http-connection-reaper] DEBUG com.amazonaws.internal.SdkSSLSocket  - shutting down output of my-bucket.s3.amazonaws.com/XX.XX.XX.XX:443

Somehow this never gets retried.

kartikeyachoudhary commented 4 years ago

was this ever fixed?

biajoeknee commented 2 years ago

This is happening to me currently, but not with this library but with Swift. I'm hoping that if this was fixed here that the issue may be related, and therefore the solution applicable, and if not — being that I can reliably produce the error — that I can provide insight into what's causing it for me and perhaps help resolve it for all SDK users once and for all — as I've found this problem to be applicable across other AWS SDK libraries as well.

mohitjain1609 commented 6 months ago

For me the issue is happening that whenever i am uploading a file below 300KB i am able to successfully able to upload in S3 bucket using mulesoft connector, but whenever the file is above 300KB i am getting this timeout error, if someone have any suggestions please let me know.