Closed oliverdain closed 5 years ago
Thanks for reporting! We are aware of the issue and it is caused by the fact that service can close connection after certain time and we are actively working on a fix.
Thanks for the quick response @zoewangg . I'm actively looking for workarounds because this is causing issues in customer facing code for us. When you say, "service can close connection after certain time" do you mean this is a server-side S3 bug? I was considering replacing our use of the AWS SDK with just a plain HTTP client but if this issue is server side that obviously won't help.
Is there anything we can do to work around this? I tried a few obvious things like putting short delays between file downloads but that didn't help.
Any short-term workaround advice would be greatly appreciated.
Also, a very rough timeline would also be great. I know you can't promise a date, but if you're working on this now and think you'll have a fix in days it doesn't make sense for me to spend a lot of time on workarounds. OTOH, if nobody is going to start working on this bug for a few weeks I need to do something.
No, it's a bug in the SDK. Would v2 sync clients work for you?
I think this exception would get retried and should succeed in the next attempt, is it not the case for you?
Would v2 sync clients work for you?
We could spawn threads and use sync clients. A bit less efficient but would be OK.
I think this exception would get retried and should succeed in the next attempt, is it not the case for you?
What we see is that with retries enabled this bug is less common but still appears quite often (e.g. every 250 files instead of every 100).
Even if retries did work we can't use them though. Our use case is to create a zip file containing all the data from all of these photos in a streaming way without holding it all in RAM so retries aren't an option for us; they'd cause a corrupt file because the zip archive would contain part of the file (the part that failed part way through) and then on retry that part of the file would be repeated.
Thanks for the information. Please switch to v2 sync clients if that's easy workaround for you. We don't have a clear date for the fix but fixing it is one of our top priorities right now.
Update: unfortunately, it will take longer time than expected to fix this issue.
UPDATE: Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it.
We are also experiencing this issue with
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
<version>2.5.6</version>
</dependency>
Our use case is a job downloading around 20 files per execution, using the S3AsyncClient. We get this issue around every 3rd to 5th job execution.
Thanks for the temporary workaround, @zoewangg
Any update. We are hitting this issue too.
We tried implementing the workaround suggested. How do we create the override URI for different regions?
Update: we reached out to the service team and we are still working on finding the right solution for it.
Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it.
@vvellanki You can find all S3 endpoints in https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
The options I see are: 1) Use http instead of https - does not work for us 2) Use sync v2 APIs instead of async - in our case, the change doesn't appear trivial due to dependencies. We hit some issues due to using older httpclient jar 3) Wait for a fix for this issue
How about if we restrict the number of concurrent calls to a fixed number? Is that a workaround?
Summary The root cause is that service might close idle connections for a certain amount of time and there might be a race condition where a soon-stale connection gets picked up by SDK and the connection gets closed in the same time SDK sends out the request.
Reducing connectionMaxIdleTime should reduce the chances of occurrences of this issue.
@vvellanki Can you try to update connectionMaxIdleTime
to 5 seconds to see if it helps?
s3Async = S3AsyncClient.builder()
.httpClientBuilder(NettyNioAsyncHttpClient.builder()
.connectionMaxIdleTime(Duration.ofSeconds(5)))
.build();
We are experiencing a similar issue, but with DynamoDB:
The application is a worker, consuming from kinesis, using KCL, and reading and writing to dynamo.
The stacktrace when using aws-sdk-java-v2 2.5.11
:
Note: when we tried the version aws sdk 2.5.25, the exception is not thrown, but the code is stuck forever trying to read/write from dynamo (we call .await()
in a GlobalScope.async {}
, waiting for dynamo complete).
software.amazon.awssdk.core.exception.SdkClientException: null
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:123)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:105)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:241)
at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:237)
at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$writeRequest$6(NettyRequestExecutor.java:191)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:991)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:873)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1365)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
at io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:113)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
at io.netty.handler.ssl.SslHandler.finishWrap(SslHandler.java:894)
at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:880)
at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:797)
at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:778)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.flush(CombinedChannelDuplexHandler.java:533)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.CombinedChannelDuplexHandler.flush(CombinedChannelDuplexHandler.java:358)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.handler.logging.LoggingHandler.flush(LoggingHandler.java:265)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
at com.typesafe.netty.http.HttpStreamsHandler.completeBody(HttpStreamsHandler.java:298)
at com.typesafe.netty.http.HttpStreamsHandler.access$400(HttpStreamsHandler.java:14)
at com.typesafe.netty.http.HttpStreamsHandler$3$1.run(HttpStreamsHandler.java:276)
at com.typesafe.netty.http.HttpStreamsHandler.executeInEventLoop(HttpStreamsHandler.java:342)
at com.typesafe.netty.http.HttpStreamsHandler.access$300(HttpStreamsHandler.java:14)
at com.typesafe.netty.http.HttpStreamsHandler$3.complete(HttpStreamsHandler.java:273)
at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:244)
at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:241)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndCompleteAll(AbstractCoalescingBufferQueue.java:340)
at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndFailAll(AbstractCoalescingBufferQueue.java:207)
at io.netty.handler.ssl.SslHandler.releaseAndFailAll(SslHandler.java:1591)
at io.netty.handler.ssl.SslHandler.access$900(SslHandler.java:166)
at io.netty.handler.ssl.SslHandler$5.run(SslHandler.java:1785)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.lang.Thread.run(Thread.java:748)
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)
Are these issues related? I'm thinking in use this lib: https://github.com/bradynpoulsen/aws-sdk-kotlin-coroutines/, that is a wrapper for the aws-sdk-java v1, but with support to coroutines out of the box. It should be less efficient, but it may work.
I tried to use http
as the protocol in dynamo endpoint and also set connectionMaxIdleTime
to 5 seconds, without success. The problem happens after some minutes that the application is running when using with KCL v2, and after a greater period when using KCL v1.
I think we've experienced this same behavior before, with s3, but we change to use s3 from aws v1, and the issue has gone away. Now, we are experiencing the issue with dynamo.
Environment:
I've experienced this issue as well. We are downloading the same list of files (about 60 small json files) every 5 seconds (to check for changes).
The change proposed by @zoewangg does seem to mitigate this issue.
This issue seems fairly bad. The following code fails after a few iterations:
try (S3AsyncClient client = S3AsyncClient.create()) {
client.putObject(r -> r.bucket("tmp.millems").key("test"), AsyncRequestBody.fromBytes("".getBytes())).join();
for (int i = 0; i < 1000; ++i) {
System.out.println("Iteration " + i);
Path outputPath = Paths.get("/tmp/foo");
Files.deleteIfExists(outputPath);
client.getObject(r -> r.bucket("tmp.millems").key("test"), AsyncResponseTransformer.toFile(outputPath)).join();
}
}
SDK retries don't seem to help fix the problem.
Failure logs:
2019-04-16 16:34:21,476 [aws-java-sdk-NettyEventLoop-1-4] DEBUG software.amazon.awssdk.request - Retryable error detected, will retry in 311ms, attempt number 3
2019-04-16 16:34:21,477 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0xf04bce0b, L:/10.94.46.143:56881 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] CLOSE
2019-04-16 16:34:21,477 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0xf04bce0b, L:/10.94.46.143:56881 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] UNREGISTERED
2019-04-16 16:34:21,788 [sdk-ScheduledExecutor-2-1] DEBUG software.amazon.awssdk.request - Retrying Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=s3.eu-west-1.amazonaws.com, port=443, encodedPath=/tmp.millems/test, headers=[amz-sdk-invocation-id, User-Agent, x-amz-te], queryParameters=[])
2019-04-16 16:34:21,791 [sdk-ScheduledExecutor-2-1] DEBUG software.amazon.awssdk.auth.signer.Aws4Signer - AWS4 String to sign: AWS4-HMAC-SHA256
20190416T233421Z
20190416/eu-west-1/s3/aws4_request
7f129f3fe741eebd1b6aba5a36006198c2cb6275da89198fd117ce9730c2eeae
2019-04-16 16:34:21,797 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979] REGISTERED
2019-04-16 16:34:21,798 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979] CONNECT: s3.eu-west-1.amazonaws.com/52.218.24.211:443
2019-04-16 16:34:21,965 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] ACTIVE
2019-04-16 16:34:21,982 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE: software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest(DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET https://s3.eu-west-1.amazonaws.com/tmp.millems/test HTTP/1.1
amz-sdk-invocation-id: 4126936f-2cb5-a0e6-ae52-e5f1e122ff4a
amz-sdk-retry: 3/311/485
Authorization: xxx
Host: s3.eu-west-1.amazonaws.com
User-Agent: aws-sdk-java/2.5.24-SNAPSHOT Mac_OS_X/10.12.6 Java_HotSpot_TM__64-Bit_Server_VM/25.202-b08 Java/1.8.0_202 vendor/Oracle_Corporation
x-amz-content-sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20190416T233421Z
x-amz-te: append-md5)
2019-04-16 16:34:21,982 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE, DefaultHttpContent(data: EmptyByteBufBE, decoderResult: success), 0B
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:22,154 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,168 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,174 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,184 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,192 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,359 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.ssl.SslHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] HANDSHAKEN: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
2019-04-16 16:34:22,360 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] USER_EVENT: SslHandshakeCompletionEvent(SUCCESS)
2019-04-16 16:34:22,364 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE, EmptyLastHttpContent, 0B
2019-04-16 16:34:22,367 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:22,369 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,599 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ: DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
x-amz-id-2: SkrTAHIrukTmY1Pm6f6fMJqTzAGJrtfbl6khnC/LWoPjd4lyxQlhjtzsOTeEpqZ130INC2fp9PE=
x-amz-request-id: FE32414B3EFE3E86
Date: Tue, 16 Apr 2019 23:34:23 GMT
Last-Modified: Tue, 16 Apr 2019 23:34:02 GMT
ETag: "d41d8cd98f00b204e9800998ecf8427e"
x-amz-transfer-encoding: append-md5
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 16
Server: AmazonS3
2019-04-16 16:34:22,600 [aws-java-sdk-NettyEventLoop-1-2] DEBUG software.amazon.awssdk.request - Received successful response: 200
2019-04-16 16:34:22,601 [aws-java-sdk-NettyEventLoop-1-2] DEBUG software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain - Interceptor 'software.amazon.awssdk.services.s3.internal.handlers.EnableTrailingChecksumInterceptor@74294adb' modified the message with its modifyResponse method.
2019-04-16 16:34:22,602 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 16, cap: 16/16, unwrapped: PooledUnsafeDirectByteBuf(ridx: 16, widx: 16, cap: 45)), decoderResult: success), 16B
+-------------------------------------------------+
| 0 1 2 3 4 5 6 7 8 9 a b c d e f |
+--------+-------------------------------------------------+----------------+
|00000000| d4 1d 8c d9 8f 00 b2 04 e9 80 09 98 ec f8 42 7e |..............B~|
+--------+-------------------------------------------------+----------------+
2019-04-16 16:34:22,602 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:28,643 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] USER_EVENT: SslCloseCompletionEvent(SUCCESS)
2019-04-16 16:34:28,644 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:29,065 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:29,067 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] INACTIVE
2019-04-16 16:34:29,184 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.buffer.PoolThreadCache - Freed 19 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-4
2019-04-16 16:34:29,184 [aws-java-sdk-NettyEventLoop-1-3] DEBUG io.netty.buffer.PoolThreadCache - Freed 22 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-3
2019-04-16 16:34:29,192 [aws-java-sdk-NettyEventLoop-1-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 18 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-1
2019-04-16 16:34:29,193 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] CLOSE
2019-04-16 16:34:29,194 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] UNREGISTERED
2019-04-16 16:34:31,607 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 19 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-2
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException
...
It looks like the SDK might not properly be finalizing the response, because the debug log shows content-length: 16 as well as 16 bytes being returned.
Update 1: ResponseHandler > PublisherAdapter > subscribe() > new Subscriber > onComplete()
is not being invoked in the case that an exception is being thrown. It should be invoked. We can tell it's the end of the stream based on the fact that ResponseHandler > PublisherAdapter > subscribe() > new Subscriber > onNext(HttpContent)
was invoked with a LastHttpContent
.
Update 2: On a good call where LastHttpContent triggers the onComplete, the onNext is invoked in this way:
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1336)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:268)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:222)
at com.typesafe.netty.HandlerPublisher.publishMessage(HandlerPublisher.java:367)
at com.typesafe.netty.HandlerPublisher.channelRead(HandlerPublisher.java:345)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at com.typesafe.netty.http.HttpStreamsHandler.handleReadHttpContent(HttpStreamsHandler.java:184)
at com.typesafe.netty.http.HttpStreamsHandler.channelRead(HttpStreamsHandler.java:165)
at com.typesafe.netty.http.HttpStreamsClientHandler.channelRead(HttpStreamsClientHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
...
On a bad call where LastHttpContent does not trigger the onComplete, the onNext is invoked in this way:
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1336)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:268)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:222)
at com.typesafe.netty.HandlerPublisher.publishMessage(HandlerPublisher.java:367)
at com.typesafe.netty.HandlerPublisher.flushBuffer(HandlerPublisher.java:304)
at com.typesafe.netty.HandlerPublisher.receivedDemand(HandlerPublisher.java:258)
at com.typesafe.netty.HandlerPublisher.access$200(HandlerPublisher.java:41)
at com.typesafe.netty.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:452)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.lang.Thread.run(Thread.java:748)
Update 3: It looks like we're dropping downstream publisher demand in the checksum validator.
Created #1210, which will help with some of these issues with S3AsyncClient.getObject
.
An improvement for this issue will go out with tomorrow's release. Please test it out and see if it fixes the problem for you.
A change has gone out. Can you please update to the latest client version and see if you're still seeing the problem?
Tried with the latest build 2.5.29. Still seeing the same issue.
Caused By (java.io.IOException) Server failed to send complete response software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted():398 software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive():148 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.handler.logging.LoggingHandler.channelInactive():167 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.handler.timeout.IdleStateHandler.channelInactive():277 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive():420 io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed():377 io.netty.handler.codec.ByteToMessageDecoder.channelInactive():342 io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive():282 io.netty.channel.CombinedChannelDuplexHandler.channelInactive():223 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed():377 io.netty.handler.codec.ByteToMessageDecoder.channelInactive():342 io.netty.handler.ssl.SslHandler.channelInactive():1010 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.handler.timeout.IdleStateHandler.channelInactive():277 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive():1409 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.DefaultChannelPipeline.fireChannelInactive():927 io.netty.channel.AbstractChannel$AbstractUnsafe$8.run():822 io.netty.util.concurrent.AbstractEventExecutor.safeExecute():163 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks():404 io.netty.channel.nio.NioEventLoop.run():463 io.netty.util.concurrent.SingleThreadEventExecutor$5.run():886 java.lang.Thread.run():748
@vvellanki Thanks for checking. Were you able to update connectionMaxIdleTime to 5 seconds to see if it helps?
We want to see if making this the default will fix the problem for customers with access patterns like yours.
@millems I still face the issue reducing the connectionMaxIdleTime to 5 seconds and the latest version 2.5.29
S3AsyncClient.builder().httpClientBuilder(NettyNioAsyncHttpClient.builder() .writeTimeout(Duration.ofSeconds(100l)) .readTimeout(Duration.ofSeconds(100l)) .connectionMaxIdleTime(Duration.ofSeconds(5))).build()
@Kumarnitin2001 Thanks for confirmation.
@millems Is there any news regarding this issue?
We would rather not restart our app once a day (if possible)
We haven't been able to reproduce this specific issue with connectionMaxIdleTime(Duration.ofSeconds(5))
set. That's fixed it for all of our scenarios. We're still trying to reproduce it. If you have a repro case, that would be fantastic.
@millems this issue is still occurring for us (even with connectionMaxIdleTime(Duration.ofSeconds(5))
).
An example from our logs: App start: 19-05-22 12:22:55 1st error: 19-05-22 16:26:05 (list objects timeout) App stop (manually): 19-05-24 17:33:24 During the above run time, there were 56 list-objects timeouts & 21 get-objects timeouts (see below). We add a 5 second timeout to each of our requests, after which we stop waiting and return an error.
19-05-22 16:26:05 ERROR - :list-objects!
19-05-22 17:16:53 ERROR - :list-objects!
19-05-22 17:53:26 ERROR - :list-objects!
19-05-22 20:38:37 ERROR - :list-objects!
19-05-22 22:44:16 ERROR - :list-objects!
19-05-22 23:33:55 ERROR - :list-objects!
19-05-23 00:07:52 ERROR - :list-objects!
19-05-23 00:20:22 ERROR - :list-objects!
19-05-23 00:53:55 ERROR - :list-objects!
19-05-23 02:20:00 ERROR - :list-objects!
19-05-23 04:11:37 ERROR - :list-objects!
19-05-23 06:38:47 ERROR - :list-objects!
19-05-23 09:42:29 ERROR - :list-objects!
19-05-23 12:36:40 ERROR - :list-objects!
19-05-23 14:16:17 ERROR - :list-objects!
19-05-23 17:09:57 ERROR - :list-objects!
19-05-23 18:23:32 ERROR - :list-objects!
19-05-23 20:04:08 ERROR - :list-objects!
19-05-23 21:17:39 ERROR - :get-object!
19-05-23 21:17:39 ERROR - :get-object!
19-05-23 23:33:50 ERROR - :list-objects!
19-05-24 00:41:25 ERROR - :list-objects!
19-05-24 01:03:38 ERROR - :get-object!
19-05-24 01:15:51 ERROR - :get-object!
19-05-24 01:15:51 ERROR - :get-object!
19-05-24 01:15:51 ERROR - :get-object!
19-05-24 01:16:01 ERROR - :get-object!
19-05-24 01:16:01 ERROR - :get-object!
19-05-24 01:45:57 ERROR - :get-object!
19-05-24 01:56:33 ERROR - :get-object!
19-05-24 05:05:30 ERROR - :get-object!
19-05-24 05:28:07 ERROR - :list-objects!
19-05-24 05:33:37 ERROR - :list-objects!
19-05-24 06:39:46 ERROR - :list-objects!
19-05-24 09:21:58 ERROR - :get-object!
19-05-24 09:22:08 ERROR - :list-objects!
19-05-24 09:26:56 ERROR - :list-objects!
19-05-24 09:35:57 ERROR - :list-objects!
19-05-24 11:04:48 ERROR - :get-object!
19-05-24 11:05:24 ERROR - :list-objects!
19-05-24 11:05:34 ERROR - :list-objects!
19-05-24 11:05:44 ERROR - :get-object!
19-05-24 11:05:54 ERROR - :list-objects!
19-05-24 11:06:04 ERROR - :list-objects!
19-05-24 11:06:14 ERROR - :list-objects!
19-05-24 11:06:35 ERROR - :list-objects!
19-05-24 11:06:50 ERROR - :list-objects!
19-05-24 11:07:00 ERROR - :list-objects!
19-05-24 11:07:15 ERROR - :list-objects!
19-05-24 11:07:41 ERROR - :list-objects!
19-05-24 11:07:56 ERROR - :list-objects!
19-05-24 11:08:06 ERROR - :list-objects!
19-05-24 11:08:21 ERROR - :list-objects!
19-05-24 11:08:36 ERROR - :get-object!
19-05-24 12:04:21 ERROR - :get-object!
19-05-24 13:27:13 ERROR - :get-object!
19-05-24 13:27:23 ERROR - :get-object!
19-05-24 15:16:44 ERROR - :get-object!
19-05-24 15:16:44 ERROR - :get-object!
19-05-24 15:16:44 ERROR - :get-object!
19-05-24 15:17:04 ERROR - :list-objects!
19-05-24 15:17:14 ERROR - :list-objects!
19-05-24 15:17:29 ERROR - :list-objects!
19-05-24 15:17:50 ERROR - :list-objects!
19-05-24 15:18:16 ERROR - :list-objects!
19-05-24 15:18:26 ERROR - :list-objects!
19-05-24 15:18:36 ERROR - :list-objects!
19-05-24 15:19:02 ERROR - :list-objects!
19-05-24 15:19:17 ERROR - :list-objects!
19-05-24 15:19:27 ERROR - :list-objects!
19-05-24 15:19:48 ERROR - :list-objects!
19-05-24 15:20:08 ERROR - :list-objects!
19-05-24 15:20:18 ERROR - :list-objects!
19-05-24 15:20:28 ERROR - :list-objects!
19-05-24 15:20:38 ERROR - :list-objects!
19-05-24 16:12:20 ERROR - :list-objects!
19-05-24 16:18:46 ERROR - :list-objects!
I'm still seeing this reliably after downloading small files from s3 with several concurrent requests for a couple of minutes with 2.5.53.
I can confirm this is still an issue. Affected version: 2.5.54
Async S3 client leaks native memory (or heap allocated buffers if you tell Netty not to use direct memory), bundled FileAsyncResponseTransformer
is affected
we derived a throttled implementation from FileAsyncResponseTransformer
- essentially a backpressure, pulling more data when current buffer is done processing, that thing leaks memory like there is no tomorrow, please fix it, we had to move to sync client
this client is completely broken as soon as you attach an NVMe drive to the instance, we're seeing https://github.com/aws/aws-sdk-java-v2/issues/1122 try downloading anythig above 1gb
it would appear non-ssl s3 endpoint works better (no OOM)
@gibffe @marshallpierce @Eyal-Shalev
I was able to reproduce the issue by downloading an object above 1g using an instance with local NVMe storage; we have pushed out a fix #1335 and it is available in 2.7.4
. Could you try to see if the problem still exists with latest version?
I'm still getting the error with 2.7.4 here is the code.
`final Path tempFile = Files.createTempFile(workDirectory, tempFilePrefix, "tmp");
AwsCredentialsProvider creds = StaticCredentialsProvider.create(AwsBasicCredentials.create(awsKey,
awsSecret));
S3AsyncClient client = S3AsyncClient.builder().credentialsProvider(creds).build();
final CompletableFuture<GetObjectResponse> futureGet = client.getObject(
GetObjectRequest.builder()
.bucket(s3Bucket)
.key(s3Path)
.build(),
AsyncResponseTransformer.toFile(tempFile.toFile()));
futureGet.whenComplete((resp, err) -> {
try {
if (resp != null) {
System.out.println(resp);
} else {
err.printStackTrace();
}
} finally {
client.close();
}
});
futureGet.join();`
Here is the stack trace.
16:32:23,432 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException 16:32:23,432 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61) 16:32:23,432 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51) 16:32:23,433 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage$$Lambda$260/1366089113.apply(Unknown Source) 16:32:23,433 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$HandleCompletion.run(CompletableFuture.java:1403) 16:32:23,433 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126) 16:32:23,436 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107) 16:32:23,436 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor$$Lambda$256/24632579.accept(Unknown Source) 16:32:23,436 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$WhenCompleteCompletion.run(CompletableFuture.java:1298) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$exceptionCaught$0(ResponseHandler.java:142) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$$Lambda$362/1655983226.run(Unknown Source) 16:32:23,438 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180) 16:32:23,439 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.exceptionCaught(ResponseHandler.java:142) 16:32:23,440 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285) 16:32:23,440 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:856) 16:32:23,440 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at com.typesafe.netty.HandlerPublisher.channelRead(HandlerPublisher.java:356) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) 16:32:23,442 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) 16:32:23,442 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
Caused by: software.amazon.awssdk.core.exception.SdkClientException at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97) [sdk-core.jar:] at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98) [sdk-core.jar:] at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125) [sdk-core.jar:] ... 71 more Caused by: java.lang.NullPointerException at software.amazon.awssdk.core.internal.http.async.AsyncStreamingResponseHandler.onHeaders(AsyncStreamingResponseHandler.java:56) [sdk-core.jar:] at software.amazon.awssdk.core.internal.http.async.AsyncAfterTransmissionInterceptorCallingResponseHandler.onHeaders(AsyncAfterTransmissionInterceptorCallingResponseHandler.java:67) [sdk-core.jar:] at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onHeaders(MakeAsyncHttpRequestStage.java:235) [sdk-core.jar:] at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:94) [netty-nio-client.jar:] at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:65) [netty-nio-client.jar:] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-transport.jar:4.1.33.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport.jar:4.1.33.Final] ... 58 more
Hi @kranthigurram, that seems like a different issue. I created a separate issue #1343 to track it. We will take a look at it.
@zoewangg Thank you
Thanks @zoewangg for looking into this issue. The client seems to be stable now. 🤞 🤞 🤞 I'll run a soak test and report back.
@zoewangg I tried upgrading to 2.7.5, but now I'm getting Server failed to send complete response
on ElasticLoadBalancingAsyncClient.registerInstancesWithLoadBalancer
Edit: this only occurs sometimes, but didn't happen that often on 2.5.29
[software.amazon.awssdk.utils.CompletableFutureUtils errorAsCompletionException "CompletableFutureUtils.java" 61]
[software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl build "SdkClientException.java" 97]
[software.amazon.awssdk.http.nio.netty.internal.ResponseHandler notifyIfResponseNotCompleted "ResponseHandler.java" 396]
[software.amazon.awssdk.http.nio.netty.internal.ResponseHandler channelInactive "ResponseHandler.java" 148]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.handler.logging.LoggingHandler channelInactive "LoggingHandler.java" 167]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
[io.netty.handler.timeout.IdleStateHandler channelInactive "IdleStateHandler.java" 277]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext fireChannelInactive "CombinedChannelDuplexHandler.java" 420]
[io.netty.handler.codec.ByteToMessageDecoder channelInputClosed "ByteToMessageDecoder.java" 390]
[io.netty.handler.codec.ByteToMessageDecoder channelInactive "ByteToMessageDecoder.java" 355]
[io.netty.handler.codec.http.HttpClientCodec$Decoder channelInactive "HttpClientCodec.java" 282]
[io.netty.channel.CombinedChannelDuplexHandler channelInactive "CombinedChannelDuplexHandler.java" 223]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.handler.codec.ByteToMessageDecoder channelInputClosed "ByteToMessageDecoder.java" 390]
[io.netty.handler.codec.ByteToMessageDecoder channelInactive "ByteToMessageDecoder.java" 355]
[io.netty.handler.ssl.SslHandler channelInactive "SslHandler.java" 1054]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
[io.netty.handler.timeout.IdleStateHandler channelInactive "IdleStateHandler.java" 277]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
[io.netty.channel.DefaultChannelPipeline$HeadContext channelInactive "DefaultChannelPipeline.java" 1403]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
[io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
[io.netty.channel.DefaultChannelPipeline fireChannelInactive "DefaultChannelPipeline.java" 912]
[io.netty.channel.AbstractChannel$AbstractUnsafe$8 run "AbstractChannel.java" 826]
[io.netty.util.concurrent.AbstractEventExecutor safeExecute "AbstractEventExecutor.java" 163]
[io.netty.util.concurrent.SingleThreadEventExecutor runAllTasks "SingleThreadEventExecutor.java" 404]
[io.netty.channel.nio.NioEventLoop run "NioEventLoop.java" 495]
[io.netty.util.concurrent.SingleThreadEventExecutor$5 run "SingleThreadEventExecutor.java" 905]
[java.lang.Thread run "Thread.java" 834]
@Eyal-Shalev Server failed to send complete response
can happen when the service closes the connection mid request and this is valid per HTTP spec. Normally retry would work; is there any chance retry is disabled on ElasticLoadBalancingAsyncClient
? If not, reducing connectionMaxIdleTime
would help by closing idle connections more frequently. Are you still experiencing issue with s3 async client?
Guys, I'm still experiencing the same issue as the other devs. In my case it affects my Kinesis producer which is ingesting a couple of thousands of messages per second.
It worth mention that I'm using connectionMaxIdleTime
of 5 seconds, as suggested earlier on this thread.
I'm using JDK 11 with the following libraries:
properties:
# VERSIONS
version.kotlin: 1.3.40
version.jackson: 2.9.9
version.aws: 2.7.15
version.aws-kinesis: 2.2.1
Here is the stack trace.
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1076)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:827)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
... 54 common frames omitted
We are also facing the same issue with the async client version 2.7.13. Our use case involves making a few calls to s3 to serve UI requests, and we wanted to use the async client. We are getting this error quite frequently, around the same rate as @dazito mentioned, and is becoming difficult for us to constantly monitor and restart the service. @zoewangg In case if there is not an immediate fix planned in an upcoming version, we might have to do try out s3 sync client, and in case it works, performance impact analysis. Do you have any suggestions on what should be the way forward?
@miere The suggested setting connectionMaxIdleTim = 5 seconds
only applies to s3 client. This github issue tracks s3 specific issues, could you create a new issue for it (with repro case if possible)? We will look into it.
@ybhatnagar To be clear, are you having the issue with s3 sync client or async client? Have you tried to set connectionMaxIdleTime
to 5 seconds? What APIs are you using? Is retry disabled? (retry is enabled by default).
@zoewangg It seems like our issue with S3 connection was indeed resolved.
We are implementing retry ourselves (with exponential backoff), and to mitigate the issue with registerInstancesWithLoadBalancer
we had to set up to 10 retries (5 wasn't enough).
@zoewangg We haven't disabled the retry, is it enabled by default?
@Eyal-Shalev Glad to hear the issue was resolved for s3 client. Retry is enabled be default and the default max number of retries is 3.
Closing the issue due to no recent response. Feel free to open a new issue if you continue to see the error.
I'm facing this issue with version 2.5.61 and I find this discussion.
I have one question, do I need to set connectionMaxIdleTime(Duration.ofSeconds(5)) after I update the dependency to version 2.9.11?
We just merged the change to updated the default connectionMaxIdleTime
for s3 client via #1450 and it will be included in the next release.
Before that, you'd still need to set connectionMaxIdleTime(Duration.ofSeconds(5))
I'm seeing this issue in the latest version of the SDK. I'm reading from SQS via Alpakka using the async client. I tried setting the connectionMaxIdleTime
to 5 seconds and I'm still getting the same error.
Kotlin: 1.3.41
Netty & SQS Java SDK: 2.9.15
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1078)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:278)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:827)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
... 52 more
Caused by: java.io.IOException: Server failed to send complete response
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
... 43 more
Hi @jaylynstoesz, I see you're using SQS so can you please open a new github issue, with a sample code we can use to repro? Setting connectionMaxIdleTime
to 5 seconds is a suggestion tailored for S3 clients.
For SNS/SQS I have created this ticket https://github.com/aws/aws-sdk-java-v2/issues/1520 @jaylynstoesz
@millems @debora-ito @zoewangg
Hi. Reducing CONNECTION_MAX_IDLE_TIMEOUT to 5 seconds is not a workaround. I'm still having the following problem: Any solution?
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Channel was closed before it could be written to.
[S3 Configuration] Please let me know if there are any settings that need to be changed because they are too low or too high.
@Configuration
@EnableConfigurationProperties(S3ClientConfig.S3ClientConfigurationProperties::class)
open class S3ClientConfig(
private val properties: S3ClientConfigurationProperties
) {
companion object {
private const val HTTP_CLIENT_MAX_CONCURRENCY = 1000
private const val MAX_PENDING_CONNECTION_ACQUIRE = 10000
private val LOG = LoggerFactory.getLogger(this::class.java)
private val CONNECTION_TIMEOUT = Duration.ofSeconds(2)
private val CONNECTION_ACQUISITION_TIMEOUT = Duration.ofSeconds(2)
private val CONNECTION_MAX_IDLE_TIME = Duration.ofSeconds(5)
private val WRITE_TIMEOUT = Duration.ofSeconds(30)
private val READ_TIMEOUT = Duration.ofSeconds(30)
}
@Bean
open fun s3AsyncClient(): S3AsyncClient {
val clientBuilder = S3AsyncClient.builder()
.httpClient(getHttpClient())
.serviceConfiguration(getServiceConfiguration())
.region(getS3Region())
.dualstackEnabled(true)
.credentialsProvider(getS3CredentialProvider())
.overrideConfiguration { it.retryPolicy(RetryPolicy.none()) }
.asyncConfiguration { it.advancedOption(SdkAdvancedAsyncClientOption.FUTURE_COMPLETION_EXECUTOR, Dispatchers.IO.asExecutor()) }
if (properties.getEndpoint() != null) {
clientBuilder.endpointOverride(properties.getEndpoint())
}
return clientBuilder.build()
}
private fun getHttpClient(): SdkAsyncHttpClient {
return NettyNioAsyncHttpClient.builder()
.protocol(Protocol.HTTP1_1)
.tcpKeepAlive(true)
.writeTimeout(WRITE_TIMEOUT)
.readTimeout(READ_TIMEOUT)
.connectionTimeout(CONNECTION_TIMEOUT)
.connectionAcquisitionTimeout(CONNECTION_ACQUISITION_TIMEOUT)
.connectionMaxIdleTime(CONNECTION_MAX_IDLE_TIME)
.maxConcurrency(HTTP_CLIENT_MAX_CONCURRENCY)
.maxPendingConnectionAcquires(MAX_PENDING_CONNECTION_ACQUIRE)
.useIdleConnectionReaper(true)
.build()
}
private fun getServiceConfiguration(): S3Configuration {
return S3Configuration.builder()
.checksumValidationEnabled(false)
.chunkedEncodingEnabled(true)
.accelerateModeEnabled(false)
.build()
}
private fun getS3Region(): Region {
return Region.of(properties.getRegion())
}
private fun getS3CredentialProvider(): AwsCredentialsProvider {
try {
return DefaultCredentialsProvider.create()
} catch (ex: Exception) {
LOG.error("Invalid S3 Credential Configuration Properties")
throw InvalidAWSCredentialException("Invalid AWS Credential detected")
}
}
@ConstructorBinding
@ConfigurationProperties("content.storage.aws.s3")
@Validated
class S3ClientConfigurationProperties(
@field:NotBlank private val region: String,
private val endpoint: URI?
) {
internal fun getRegion() = region
internal fun getEndpoint() = endpoint
override fun toString(): String {
return "S3ClientConfigurationProperties(" +
"region=${getRegion()}, " +
"endpoint=${getEndpoint()}, " +
")"
}
}
}
Hi @pkgonan. For error messages that include Channel was closed before it could be written to
specifically, it is likely related to a pending fix that is currently being worked on in https://github.com/aws/aws-sdk-java-v2/pull/2883 (and I believe this is unrelated to the issue that this thread was originally created for). Please follow that PR and let us know if it resolves your issue once it is released.
I have code using the Java SDK to download a few hundred photos from an S3 bucket. The first bunch work but things start to fail around photo number 100. It does not always fail on the same file. It seems very similar to #452 though that issue is closed. Note that I'm using
S3AsyncClient
but downloading the file sequentially, never downloading more than one concurrently.The photos are about 500KB each.
Expected Behavior
I should be able to download all the objects in my bucket
Current Behavior
Am able to download many files but after about 100 requests star to fail. The exception is:
With full debug logs enabled there's a ton of output but the response from the Netty network logs from right before things crash is:
Steps to Reproduce (for bugs)
Here is a slightly modified version of the Kotlin code that produces the above (modified to remove company sensitive stuff):
Context
We provide photos to our customers and are trying to provide a "download all" function. To do this we need to stream the data from S3 through some code that incrementally adds it to a zip file which is provided to the customer. Since there's so much data we can't hold it all in RAM so we need to do this async-streaming processing. We have our own
AsyncResponseTransformer
which does this but wanted to provide an example with as little of our code as possible.AsyncResponseTransformer.toFile
and our code experience the same error.Your Environment