aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.16k stars 835 forks source link

Rapid S3 downloads causes failure #1122

Closed oliverdain closed 5 years ago

oliverdain commented 5 years ago

I have code using the Java SDK to download a few hundred photos from an S3 bucket. The first bunch work but things start to fail around photo number 100. It does not always fail on the same file. It seems very similar to #452 though that issue is closed. Note that I'm using S3AsyncClient but downloading the file sequentially, never downloading more than one concurrently.

The photos are about 500KB each.

Expected Behavior

I should be able to download all the objects in my bucket

Current Behavior

Am able to download many files but after about 100 requests star to fail. The exception is:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
    at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:50)
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:167)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:118)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:103)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:241)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$3(ResponseHandler.java:398)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:179)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:398)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
    at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
    at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:117)
    ... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
    ... 54 common frames omitted

With full debug logs enabled there's a ton of output but the response from the Netty network logs from right before things crash is:

11:16:18.311 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 - R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] READ: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 254, cap: 254/254, unwrapped: PooledUnsafeDirectByteBuf(ridx: 8446, widx: 8446, cap: 8475)), decoderResult: success), 254B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 5d 52 e1 13 91 3f b5 10 51 7b 34 f6 79 a0 39 aa |]R...?..Q{4.y.9.|
|00000010| 3a 4d fc ce 3c 68 3b 15 b5 ad 02 63 d4 f7 a0 78 |:M..<h;....c...x|
|00000020| 92 29 39 d5 39 01 a9 44 b9 c6 33 2a 8d 7e 91 f3 |.)9.9..D..3*.~..|
|00000030| 28 ea 71 58 a6 09 25 58 2f a8 4e 94 f1 24 02 65 |(.qX..%X/.N..$.e|
|00000040| a7 b5 48 11 7c 6a 95 05 d2 a8 b0 93 a1 02 d8 57 |..H.|j.........W|
|00000050| 11 04 21 ad 32 b4 2d 94 16 99 af 0e f4 00 41 2a |..!.2.-.......A*|
|00000060| 80 04 55 04 32 94 41 38 93 40 01 11 25 02 94 89 |..U.2.A8.@..%...|
|00000070| 1a 39 2a 2d 84 b0 04 12 d7 2a cd 22 60 a7 34 27 |.9*-.....*."`.4'|
|00000080| 40 94 0f 0d 0d 53 0d 85 38 5d bb 24 ea af 94 32 |@....S..8].$...2|
|00000090| 84 80 24 ad 3a 63 0c e8 f8 77 85 44 8a a0 14 1a |..$.:c...w.D....|
|000000a0| 02 74 27 51 ab e6 25 02 c6 4e a8 ed 4e 64 68 4a |.t'Q..%..N..NdhJ|
|000000b0| 4e 84 09 cf 8c 7a 0e f4 65 4c f8 2b 00 28 14 f1 |N....z..eL.+.(..|
|000000c0| 98 f0 54 0a a2 a0 24 9b 48 42 2e ed 20 ad ae 53 |..T...$.HB.. ..S|
|000000d0| a8 f0 12 a9 80 9a 05 11 a3 53 2a 72 4f 3d 70 ae |.........S*rO=p.|
|000000e0| 07 9a 61 20 27 e7 8c 27 27 2d 6a a3 ff d9 f2 d3 |..a '..''-j.....|
|000000f0| 22 9e 33 37 d2 2e 06 03 e0 06 a6 fc 72 02       |".37........r.  |
+--------+-------------------------------------------------+----------------+
11:16:18.312 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] READ COMPLETE
11:16:18.312 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] INACTIVE
11:16:18.316 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] CLOSE
11:16:18.316 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] UNREGISTERED

Steps to Reproduce (for bugs)

Here is a slightly modified version of the Kotlin code that produces the above (modified to remove company sensitive stuff):

object Main {
    val log = getLogger { }

    val s3AsyncClient = S3AsyncClient.builder()
        .credentialsProvider(DefaultCredentialsProvider.create())
        .region(Region.US_WEST_2)
        // retries disable here because we can't retry in our situation - the real use case is to stream
        // data from a server to a customer. Note that you get the same failures even with retries -- it
        // just takes longer
        .overrideConfiguration { config ->
            config.retryPolicy(RetryPolicy.none()).build()
        }
        .build()

    val daBucket = "com-revl-revlx"

    fun doDownload(key: String) {
        val outFile = Paths.get("/tmp/experiment/").resolve(key)
        outFile.parent.toFile().mkdirs()
        if (outFile.toFile().exists()) {
            outFile.toFile().delete()
        }
        log.info("Downloading via toFile: {}", key)
        val request = GetObjectRequest.builder()
            .bucket(daBucket)
            .key(key)
            .build()

        val future = s3AsyncClient.getObject(request, AsyncResponseTransformer.toFile(outFile))

        try {
            future.join()
            log.info("Download of {} succeeded", key)
        } catch (t: Throwable) {
            log.error("Download of {} failed with:", key, t)
            throw t
        }
    }

    @JvmStatic
    fun main(argv: Array<String>) {
        ThreadUtils.setLoggingDefaultExceptionHandler()
        val keys = (1..240).toList()

        var numDone = 1
        for (i in keys) {
            log.info("Process file number {}", numDone)
            ++numDone
            val key = String.format("path/to/photo_%05d.jpg", i)
            doDownload(key)
        }
    }
}

Context

We provide photos to our customers and are trying to provide a "download all" function. To do this we need to stream the data from S3 through some code that incrementally adds it to a zip file which is provided to the customer. Since there's so much data we can't hold it all in RAM so we need to do this async-streaming processing. We have our own AsyncResponseTransformer which does this but wanted to provide an example with as little of our code as possible. AsyncResponseTransformer.toFile and our code experience the same error.

Your Environment

zoewangg commented 5 years ago

Thanks for reporting! We are aware of the issue and it is caused by the fact that service can close connection after certain time and we are actively working on a fix.

oliverdain commented 5 years ago

Thanks for the quick response @zoewangg . I'm actively looking for workarounds because this is causing issues in customer facing code for us. When you say, "service can close connection after certain time" do you mean this is a server-side S3 bug? I was considering replacing our use of the AWS SDK with just a plain HTTP client but if this issue is server side that obviously won't help.

Is there anything we can do to work around this? I tried a few obvious things like putting short delays between file downloads but that didn't help.

Any short-term workaround advice would be greatly appreciated.

Also, a very rough timeline would also be great. I know you can't promise a date, but if you're working on this now and think you'll have a fix in days it doesn't make sense for me to spend a lot of time on workarounds. OTOH, if nobody is going to start working on this bug for a few weeks I need to do something.

zoewangg commented 5 years ago

No, it's a bug in the SDK. Would v2 sync clients work for you?

I think this exception would get retried and should succeed in the next attempt, is it not the case for you?

oliverdain commented 5 years ago

Would v2 sync clients work for you?

We could spawn threads and use sync clients. A bit less efficient but would be OK.

I think this exception would get retried and should succeed in the next attempt, is it not the case for you?

What we see is that with retries enabled this bug is less common but still appears quite often (e.g. every 250 files instead of every 100).

Even if retries did work we can't use them though. Our use case is to create a zip file containing all the data from all of these photos in a streaming way without holding it all in RAM so retries aren't an option for us; they'd cause a corrupt file because the zip archive would contain part of the file (the part that failed part way through) and then on retry that part of the file would be repeated.

zoewangg commented 5 years ago

Thanks for the information. Please switch to v2 sync clients if that's easy workaround for you. We don't have a clear date for the fix but fixing it is one of our top priorities right now.

zoewangg commented 5 years ago

Update: unfortunately, it will take longer time than expected to fix this issue.

UPDATE: Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it.

dazito commented 5 years ago

We are also experiencing this issue with

 <dependency>
     <groupId>software.amazon.awssdk</groupId>
     <artifactId>s3</artifactId>
     <version>2.5.6</version>
 </dependency>

Our use case is a job downloading around 20 files per execution, using the S3AsyncClient. We get this issue around every 3rd to 5th job execution.

Thanks for the temporary workaround, @zoewangg

vvellanki commented 5 years ago

Any update. We are hitting this issue too.

vvellanki commented 5 years ago

We tried implementing the workaround suggested. How do we create the override URI for different regions?

zoewangg commented 5 years ago

Update: we reached out to the service team and we are still working on finding the right solution for it.

Please note that using HTTP is not as secure as HTTPS and is vulnerable to security attacks https://en.wikipedia.org/wiki/HTTPS#Difference_from_HTTP. We do not recommend using it.

@vvellanki You can find all S3 endpoints in https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

vvellanki commented 5 years ago

The options I see are: 1) Use http instead of https - does not work for us 2) Use sync v2 APIs instead of async - in our case, the change doesn't appear trivial due to dependencies. We hit some issues due to using older httpclient jar 3) Wait for a fix for this issue

How about if we restrict the number of concurrent calls to a fixed number? Is that a workaround?

zoewangg commented 5 years ago

Summary The root cause is that service might close idle connections for a certain amount of time and there might be a race condition where a soon-stale connection gets picked up by SDK and the connection gets closed in the same time SDK sends out the request.

Reducing connectionMaxIdleTime should reduce the chances of occurrences of this issue.

@vvellanki Can you try to update connectionMaxIdleTime to 5 seconds to see if it helps?

        s3Async = S3AsyncClient.builder()
                               .httpClientBuilder(NettyNioAsyncHttpClient.builder()
                                                  .connectionMaxIdleTime(Duration.ofSeconds(5)))
                               .build();
fhsgoncalves commented 5 years ago

We are experiencing a similar issue, but with DynamoDB:

The application is a worker, consuming from kinesis, using KCL, and reading and writing to dynamo.

The stacktrace when using aws-sdk-java-v2 2.5.11: Note: when we tried the version aws sdk 2.5.25, the exception is not thrown, but the code is stuck forever trying to read/write from dynamo (we call .await() in a GlobalScope.async {}, waiting for dynamo complete).

software.amazon.awssdk.core.exception.SdkClientException: null
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
    at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:123)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:105)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:241)
    at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:237)
    at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$writeRequest$6(NettyRequestExecutor.java:191)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:991)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:873)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1365)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
    at io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:113)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
    at io.netty.handler.ssl.SslHandler.finishWrap(SslHandler.java:894)
    at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:880)
    at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:797)
    at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:778)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.flush(CombinedChannelDuplexHandler.java:533)
    at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
    at io.netty.channel.CombinedChannelDuplexHandler.flush(CombinedChannelDuplexHandler.java:358)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
    at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
    at io.netty.handler.logging.LoggingHandler.flush(LoggingHandler.java:265)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
    at com.typesafe.netty.http.HttpStreamsHandler.completeBody(HttpStreamsHandler.java:298)
    at com.typesafe.netty.http.HttpStreamsHandler.access$400(HttpStreamsHandler.java:14)
    at com.typesafe.netty.http.HttpStreamsHandler$3$1.run(HttpStreamsHandler.java:276)
    at com.typesafe.netty.http.HttpStreamsHandler.executeInEventLoop(HttpStreamsHandler.java:342)
    at com.typesafe.netty.http.HttpStreamsHandler.access$300(HttpStreamsHandler.java:14)
    at com.typesafe.netty.http.HttpStreamsHandler$3.complete(HttpStreamsHandler.java:273)
    at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:244)
    at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:241)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
    at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
    at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndCompleteAll(AbstractCoalescingBufferQueue.java:340)
    at io.netty.channel.AbstractCoalescingBufferQueue.releaseAndFailAll(AbstractCoalescingBufferQueue.java:207)
    at io.netty.handler.ssl.SslHandler.releaseAndFailAll(SslHandler.java:1591)
    at io.netty.handler.ssl.SslHandler.access$900(SslHandler.java:166)
    at io.netty.handler.ssl.SslHandler$5.run(SslHandler.java:1785)
    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at java.lang.Thread.run(Thread.java:748)
java.nio.channels.ClosedChannelException: null
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)

Are these issues related? I'm thinking in use this lib: https://github.com/bradynpoulsen/aws-sdk-kotlin-coroutines/, that is a wrapper for the aws-sdk-java v1, but with support to coroutines out of the box. It should be less efficient, but it may work.

I tried to use http as the protocol in dynamo endpoint and also set connectionMaxIdleTime to 5 seconds, without success. The problem happens after some minutes that the application is running when using with KCL v2, and after a greater period when using KCL v1.

I think we've experienced this same behavior before, with s3, but we change to use s3 from aws v1, and the issue has gone away. Now, we are experiencing the issue with dynamo.

Environment:

Eyal-Shalev commented 5 years ago

I've experienced this issue as well. We are downloading the same list of files (about 60 small json files) every 5 seconds (to check for changes).

The change proposed by @zoewangg does seem to mitigate this issue.

millems commented 5 years ago

This issue seems fairly bad. The following code fails after a few iterations:

try (S3AsyncClient client = S3AsyncClient.create()) {
    client.putObject(r -> r.bucket("tmp.millems").key("test"), AsyncRequestBody.fromBytes("".getBytes())).join();

    for (int i = 0; i < 1000; ++i) {
        System.out.println("Iteration " + i);
        Path outputPath = Paths.get("/tmp/foo");
        Files.deleteIfExists(outputPath);
        client.getObject(r -> r.bucket("tmp.millems").key("test"), AsyncResponseTransformer.toFile(outputPath)).join();
    }
}

SDK retries don't seem to help fix the problem.

millems commented 5 years ago

Failure logs:

2019-04-16 16:34:21,476 [aws-java-sdk-NettyEventLoop-1-4] DEBUG software.amazon.awssdk.request - Retryable error detected, will retry in 311ms, attempt number 3
2019-04-16 16:34:21,477 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0xf04bce0b, L:/10.94.46.143:56881 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] CLOSE
2019-04-16 16:34:21,477 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0xf04bce0b, L:/10.94.46.143:56881 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] UNREGISTERED
2019-04-16 16:34:21,788 [sdk-ScheduledExecutor-2-1] DEBUG software.amazon.awssdk.request - Retrying Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=s3.eu-west-1.amazonaws.com, port=443, encodedPath=/tmp.millems/test, headers=[amz-sdk-invocation-id, User-Agent, x-amz-te], queryParameters=[])
2019-04-16 16:34:21,791 [sdk-ScheduledExecutor-2-1] DEBUG software.amazon.awssdk.auth.signer.Aws4Signer - AWS4 String to sign: AWS4-HMAC-SHA256
20190416T233421Z
20190416/eu-west-1/s3/aws4_request
7f129f3fe741eebd1b6aba5a36006198c2cb6275da89198fd117ce9730c2eeae
2019-04-16 16:34:21,797 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979] REGISTERED
2019-04-16 16:34:21,798 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979] CONNECT: s3.eu-west-1.amazonaws.com/52.218.24.211:443
2019-04-16 16:34:21,965 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] ACTIVE
2019-04-16 16:34:21,982 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE: software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest(DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET https://s3.eu-west-1.amazonaws.com/tmp.millems/test HTTP/1.1
amz-sdk-invocation-id: 4126936f-2cb5-a0e6-ae52-e5f1e122ff4a
amz-sdk-retry: 3/311/485
Authorization: xxx
Host: s3.eu-west-1.amazonaws.com
User-Agent: aws-sdk-java/2.5.24-SNAPSHOT Mac_OS_X/10.12.6 Java_HotSpot_TM__64-Bit_Server_VM/25.202-b08 Java/1.8.0_202 vendor/Oracle_Corporation
x-amz-content-sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20190416T233421Z
x-amz-te: append-md5)
2019-04-16 16:34:21,982 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE, DefaultHttpContent(data: EmptyByteBufBE, decoderResult: success), 0B
2019-04-16 16:34:21,983 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:22,154 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,168 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,174 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,184 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,192 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,359 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.ssl.SslHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] HANDSHAKEN: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
2019-04-16 16:34:22,360 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] USER_EVENT: SslHandshakeCompletionEvent(SUCCESS)
2019-04-16 16:34:22,364 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] WRITE, EmptyLastHttpContent, 0B
2019-04-16 16:34:22,367 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] FLUSH
2019-04-16 16:34:22,369 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:22,599 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ: DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
x-amz-id-2: SkrTAHIrukTmY1Pm6f6fMJqTzAGJrtfbl6khnC/LWoPjd4lyxQlhjtzsOTeEpqZ130INC2fp9PE=
x-amz-request-id: FE32414B3EFE3E86
Date: Tue, 16 Apr 2019 23:34:23 GMT
Last-Modified: Tue, 16 Apr 2019 23:34:02 GMT
ETag: "d41d8cd98f00b204e9800998ecf8427e"
x-amz-transfer-encoding: append-md5
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 16
Server: AmazonS3
2019-04-16 16:34:22,600 [aws-java-sdk-NettyEventLoop-1-2] DEBUG software.amazon.awssdk.request - Received successful response: 200
2019-04-16 16:34:22,601 [aws-java-sdk-NettyEventLoop-1-2] DEBUG software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain - Interceptor 'software.amazon.awssdk.services.s3.internal.handlers.EnableTrailingChecksumInterceptor@74294adb' modified the message with its modifyResponse method.
2019-04-16 16:34:22,602 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 16, cap: 16/16, unwrapped: PooledUnsafeDirectByteBuf(ridx: 16, widx: 16, cap: 45)), decoderResult: success), 16B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| d4 1d 8c d9 8f 00 b2 04 e9 80 09 98 ec f8 42 7e |..............B~|
+--------+-------------------------------------------------+----------------+
2019-04-16 16:34:22,602 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:28,643 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] USER_EVENT: SslCloseCompletionEvent(SUCCESS)
2019-04-16 16:34:28,644 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:29,065 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 - R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] READ COMPLETE
2019-04-16 16:34:29,067 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] INACTIVE
2019-04-16 16:34:29,184 [aws-java-sdk-NettyEventLoop-1-4] DEBUG io.netty.buffer.PoolThreadCache - Freed 19 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-4
2019-04-16 16:34:29,184 [aws-java-sdk-NettyEventLoop-1-3] DEBUG io.netty.buffer.PoolThreadCache - Freed 22 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-3
2019-04-16 16:34:29,192 [aws-java-sdk-NettyEventLoop-1-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 18 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-1
2019-04-16 16:34:29,193 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] CLOSE
2019-04-16 16:34:29,194 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.handler.logging.LoggingHandler - [id: 0x7c275979, L:/10.94.46.143:56889 ! R:s3.eu-west-1.amazonaws.com/52.218.24.211:443] UNREGISTERED
2019-04-16 16:34:31,607 [aws-java-sdk-NettyEventLoop-1-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 19 thread-local buffer(s) from thread: aws-java-sdk-NettyEventLoop-1-2

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException

    at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException
...
millems commented 5 years ago

It looks like the SDK might not properly be finalizing the response, because the debug log shows content-length: 16 as well as 16 bytes being returned.

Update 1: ResponseHandler > PublisherAdapter > subscribe() > new Subscriber > onComplete() is not being invoked in the case that an exception is being thrown. It should be invoked. We can tell it's the end of the stream based on the fact that ResponseHandler > PublisherAdapter > subscribe() > new Subscriber > onNext(HttpContent) was invoked with a LastHttpContent.

Update 2: On a good call where LastHttpContent triggers the onComplete, the onNext is invoked in this way:

java.lang.Exception: Stack trace
    at java.lang.Thread.dumpStack(Thread.java:1336)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:268)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:222)
    at com.typesafe.netty.HandlerPublisher.publishMessage(HandlerPublisher.java:367)
    at com.typesafe.netty.HandlerPublisher.channelRead(HandlerPublisher.java:345)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at com.typesafe.netty.http.HttpStreamsHandler.handleReadHttpContent(HttpStreamsHandler.java:184)
    at com.typesafe.netty.http.HttpStreamsHandler.channelRead(HttpStreamsHandler.java:165)
    at com.typesafe.netty.http.HttpStreamsClientHandler.channelRead(HttpStreamsClientHandler.java:148)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
...

On a bad call where LastHttpContent does not trigger the onComplete, the onNext is invoked in this way:

java.lang.Exception: Stack trace
    at java.lang.Thread.dumpStack(Thread.java:1336)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:268)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onNext(ResponseHandler.java:222)
    at com.typesafe.netty.HandlerPublisher.publishMessage(HandlerPublisher.java:367)
    at com.typesafe.netty.HandlerPublisher.flushBuffer(HandlerPublisher.java:304)
    at com.typesafe.netty.HandlerPublisher.receivedDemand(HandlerPublisher.java:258)
    at com.typesafe.netty.HandlerPublisher.access$200(HandlerPublisher.java:41)
    at com.typesafe.netty.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:452)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at java.lang.Thread.run(Thread.java:748)

Update 3: It looks like we're dropping downstream publisher demand in the checksum validator.

millems commented 5 years ago

Created #1210, which will help with some of these issues with S3AsyncClient.getObject.

millems commented 5 years ago

An improvement for this issue will go out with tomorrow's release. Please test it out and see if it fixes the problem for you.

millems commented 5 years ago

A change has gone out. Can you please update to the latest client version and see if you're still seeing the problem?

vvellanki commented 5 years ago

Tried with the latest build 2.5.29. Still seeing the same issue.

Caused By (java.io.IOException) Server failed to send complete response software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted():398 software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive():148 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.handler.logging.LoggingHandler.channelInactive():167 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.handler.timeout.IdleStateHandler.channelInactive():277 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive():420 io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed():377 io.netty.handler.codec.ByteToMessageDecoder.channelInactive():342 io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive():282 io.netty.channel.CombinedChannelDuplexHandler.channelInactive():223 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed():377 io.netty.handler.codec.ByteToMessageDecoder.channelInactive():342 io.netty.handler.ssl.SslHandler.channelInactive():1010 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.ChannelInboundHandlerAdapter.channelInactive():75 io.netty.handler.timeout.IdleStateHandler.channelInactive():277 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive():224 io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive():1409 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():245 io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive():231 io.netty.channel.DefaultChannelPipeline.fireChannelInactive():927 io.netty.channel.AbstractChannel$AbstractUnsafe$8.run():822 io.netty.util.concurrent.AbstractEventExecutor.safeExecute():163 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks():404 io.netty.channel.nio.NioEventLoop.run():463 io.netty.util.concurrent.SingleThreadEventExecutor$5.run():886 java.lang.Thread.run():748

millems commented 5 years ago

@vvellanki Thanks for checking. Were you able to update connectionMaxIdleTime to 5 seconds to see if it helps?

We want to see if making this the default will fix the problem for customers with access patterns like yours.

Kumarnitin2001 commented 5 years ago

@millems I still face the issue reducing the connectionMaxIdleTime to 5 seconds and the latest version 2.5.29 S3AsyncClient.builder().httpClientBuilder(NettyNioAsyncHttpClient.builder() .writeTimeout(Duration.ofSeconds(100l)) .readTimeout(Duration.ofSeconds(100l)) .connectionMaxIdleTime(Duration.ofSeconds(5))).build()

millems commented 5 years ago

@Kumarnitin2001 Thanks for confirmation.

Eyal-Shalev commented 5 years ago

@millems Is there any news regarding this issue?

We would rather not restart our app once a day (if possible)

millems commented 5 years ago

We haven't been able to reproduce this specific issue with connectionMaxIdleTime(Duration.ofSeconds(5)) set. That's fixed it for all of our scenarios. We're still trying to reproduce it. If you have a repro case, that would be fantastic.

Eyal-Shalev commented 5 years ago

@millems this issue is still occurring for us (even with connectionMaxIdleTime(Duration.ofSeconds(5))).

An example from our logs: App start: 19-05-22 12:22:55 1st error: 19-05-22 16:26:05 (list objects timeout) App stop (manually): 19-05-24 17:33:24 During the above run time, there were 56 list-objects timeouts & 21 get-objects timeouts (see below). We add a 5 second timeout to each of our requests, after which we stop waiting and return an error.

19-05-22 16:26:05 ERROR - :list-objects!
19-05-22 17:16:53 ERROR - :list-objects!
19-05-22 17:53:26 ERROR - :list-objects!
19-05-22 20:38:37 ERROR - :list-objects!
19-05-22 22:44:16 ERROR - :list-objects!
19-05-22 23:33:55 ERROR - :list-objects!
19-05-23 00:07:52 ERROR - :list-objects!
19-05-23 00:20:22 ERROR - :list-objects!
19-05-23 00:53:55 ERROR - :list-objects!
19-05-23 02:20:00 ERROR - :list-objects!
19-05-23 04:11:37 ERROR - :list-objects!
19-05-23 06:38:47 ERROR - :list-objects!
19-05-23 09:42:29 ERROR - :list-objects!
19-05-23 12:36:40 ERROR - :list-objects!
19-05-23 14:16:17 ERROR - :list-objects!
19-05-23 17:09:57 ERROR - :list-objects!
19-05-23 18:23:32 ERROR - :list-objects!
19-05-23 20:04:08 ERROR - :list-objects!
19-05-23 21:17:39 ERROR - :get-object!
19-05-23 21:17:39 ERROR - :get-object!
19-05-23 23:33:50 ERROR - :list-objects!
19-05-24 00:41:25 ERROR - :list-objects!
19-05-24 01:03:38 ERROR - :get-object!
19-05-24 01:15:51 ERROR - :get-object!
19-05-24 01:15:51 ERROR - :get-object!
19-05-24 01:15:51 ERROR - :get-object!
19-05-24 01:16:01 ERROR - :get-object!
19-05-24 01:16:01 ERROR - :get-object!
19-05-24 01:45:57 ERROR - :get-object!
19-05-24 01:56:33 ERROR - :get-object!
19-05-24 05:05:30 ERROR - :get-object!
19-05-24 05:28:07 ERROR - :list-objects!
19-05-24 05:33:37 ERROR - :list-objects!
19-05-24 06:39:46 ERROR - :list-objects!
19-05-24 09:21:58 ERROR - :get-object!
19-05-24 09:22:08 ERROR - :list-objects!
19-05-24 09:26:56 ERROR - :list-objects!
19-05-24 09:35:57 ERROR - :list-objects!
19-05-24 11:04:48 ERROR - :get-object!
19-05-24 11:05:24 ERROR - :list-objects!
19-05-24 11:05:34 ERROR - :list-objects!
19-05-24 11:05:44 ERROR - :get-object!
19-05-24 11:05:54 ERROR - :list-objects!
19-05-24 11:06:04 ERROR - :list-objects!
19-05-24 11:06:14 ERROR - :list-objects!
19-05-24 11:06:35 ERROR - :list-objects!
19-05-24 11:06:50 ERROR - :list-objects!
19-05-24 11:07:00 ERROR - :list-objects!
19-05-24 11:07:15 ERROR - :list-objects!
19-05-24 11:07:41 ERROR - :list-objects!
19-05-24 11:07:56 ERROR - :list-objects!
19-05-24 11:08:06 ERROR - :list-objects!
19-05-24 11:08:21 ERROR - :list-objects!
19-05-24 11:08:36 ERROR - :get-object!
19-05-24 12:04:21 ERROR - :get-object!
19-05-24 13:27:13 ERROR - :get-object!
19-05-24 13:27:23 ERROR - :get-object!
19-05-24 15:16:44 ERROR - :get-object!
19-05-24 15:16:44 ERROR - :get-object!
19-05-24 15:16:44 ERROR - :get-object!
19-05-24 15:17:04 ERROR - :list-objects!
19-05-24 15:17:14 ERROR - :list-objects!
19-05-24 15:17:29 ERROR - :list-objects!
19-05-24 15:17:50 ERROR - :list-objects!
19-05-24 15:18:16 ERROR - :list-objects!
19-05-24 15:18:26 ERROR - :list-objects!
19-05-24 15:18:36 ERROR - :list-objects!
19-05-24 15:19:02 ERROR - :list-objects!
19-05-24 15:19:17 ERROR - :list-objects!
19-05-24 15:19:27 ERROR - :list-objects!
19-05-24 15:19:48 ERROR - :list-objects!
19-05-24 15:20:08 ERROR - :list-objects!
19-05-24 15:20:18 ERROR - :list-objects!
19-05-24 15:20:28 ERROR - :list-objects!
19-05-24 15:20:38 ERROR - :list-objects!
19-05-24 16:12:20 ERROR - :list-objects!
19-05-24 16:18:46 ERROR - :list-objects!
marshallpierce commented 5 years ago

I'm still seeing this reliably after downloading small files from s3 with several concurrent requests for a couple of minutes with 2.5.53.

gibffe commented 5 years ago

I can confirm this is still an issue. Affected version: 2.5.54

zoewangg commented 5 years ago

@gibffe @marshallpierce @Eyal-Shalev

I was able to reproduce the issue by downloading an object above 1g using an instance with local NVMe storage; we have pushed out a fix #1335 and it is available in 2.7.4. Could you try to see if the problem still exists with latest version?

kran7hi commented 5 years ago

I'm still getting the error with 2.7.4 here is the code.

`final Path tempFile = Files.createTempFile(workDirectory, tempFilePrefix, "tmp");
    AwsCredentialsProvider creds = StaticCredentialsProvider.create(AwsBasicCredentials.create(awsKey, 
awsSecret));
    S3AsyncClient client = S3AsyncClient.builder().credentialsProvider(creds).build();

    final CompletableFuture<GetObjectResponse> futureGet = client.getObject(
            GetObjectRequest.builder()
                    .bucket(s3Bucket)
                    .key(s3Path)
                    .build(),
            AsyncResponseTransformer.toFile(tempFile.toFile()));

    futureGet.whenComplete((resp, err) -> {
        try {
            if (resp != null) {
                System.out.println(resp);
            } else {
                err.printStackTrace();
            }
        } finally {
            client.close();
        }
    });

    futureGet.join();`

Here is the stack trace. 16:32:23,432 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException 16:32:23,432 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61) 16:32:23,432 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51) 16:32:23,433 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage$$Lambda$260/1366089113.apply(Unknown Source) 16:32:23,433 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$HandleCompletion.run(CompletableFuture.java:1403) 16:32:23,433 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) 16:32:23,434 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175) 16:32:23,435 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126) 16:32:23,436 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107) 16:32:23,436 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor$$Lambda$256/24632579.accept(Unknown Source) 16:32:23,436 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture$WhenCompleteCompletion.run(CompletableFuture.java:1298) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$exceptionCaught$0(ResponseHandler.java:142) 16:32:23,437 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$$Lambda$362/1655983226.run(Unknown Source) 16:32:23,438 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180) 16:32:23,439 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.exceptionCaught(ResponseHandler.java:142) 16:32:23,440 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285) 16:32:23,440 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:856) 16:32:23,440 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at com.typesafe.netty.HandlerPublisher.channelRead(HandlerPublisher.java:356) 16:32:23,441 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) 16:32:23,442 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) 16:32:23,442 ERROR [stderr] (aws-java-sdk-NettyEventLoop-1-4) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) Caused by: software.amazon.awssdk.core.exception.SdkClientException at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97) [sdk-core.jar:] at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98) [sdk-core.jar:] at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125) [sdk-core.jar:] ... 71 more Caused by: java.lang.NullPointerException at software.amazon.awssdk.core.internal.http.async.AsyncStreamingResponseHandler.onHeaders(AsyncStreamingResponseHandler.java:56) [sdk-core.jar:] at software.amazon.awssdk.core.internal.http.async.AsyncAfterTransmissionInterceptorCallingResponseHandler.onHeaders(AsyncAfterTransmissionInterceptorCallingResponseHandler.java:67) [sdk-core.jar:] at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onHeaders(MakeAsyncHttpRequestStage.java:235) [sdk-core.jar:] at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:94) [netty-nio-client.jar:] at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:65) [netty-nio-client.jar:] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-transport.jar:4.1.33.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport.jar:4.1.33.Final] ... 58 more

zoewangg commented 5 years ago

Hi @kranthigurram, that seems like a different issue. I created a separate issue #1343 to track it. We will take a look at it.

kran7hi commented 5 years ago

@zoewangg Thank you

gibffe commented 5 years ago

Thanks @zoewangg for looking into this issue. The client seems to be stable now. 🤞 🤞 🤞 I'll run a soak test and report back.

Eyal-Shalev commented 5 years ago

@zoewangg I tried upgrading to 2.7.5, but now I'm getting Server failed to send complete response on ElasticLoadBalancingAsyncClient.registerInstancesWithLoadBalancer

Edit: this only occurs sometimes, but didn't happen that often on 2.5.29

trace

  [software.amazon.awssdk.utils.CompletableFutureUtils errorAsCompletionException "CompletableFutureUtils.java" 61]
  [software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl build "SdkClientException.java" 97]
  [software.amazon.awssdk.http.nio.netty.internal.ResponseHandler notifyIfResponseNotCompleted "ResponseHandler.java" 396]
  [software.amazon.awssdk.http.nio.netty.internal.ResponseHandler channelInactive "ResponseHandler.java" 148]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.handler.logging.LoggingHandler channelInactive "LoggingHandler.java" 167]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.handler.timeout.IdleStateHandler channelInactive "IdleStateHandler.java" 277]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext fireChannelInactive "CombinedChannelDuplexHandler.java" 420]
  [io.netty.handler.codec.ByteToMessageDecoder channelInputClosed "ByteToMessageDecoder.java" 390]
  [io.netty.handler.codec.ByteToMessageDecoder channelInactive "ByteToMessageDecoder.java" 355]
  [io.netty.handler.codec.http.HttpClientCodec$Decoder channelInactive "HttpClientCodec.java" 282]
  [io.netty.channel.CombinedChannelDuplexHandler channelInactive "CombinedChannelDuplexHandler.java" 223]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.handler.codec.ByteToMessageDecoder channelInputClosed "ByteToMessageDecoder.java" 390]
  [io.netty.handler.codec.ByteToMessageDecoder channelInactive "ByteToMessageDecoder.java" 355]
  [io.netty.handler.ssl.SslHandler channelInactive "SslHandler.java" 1054]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.ChannelInboundHandlerAdapter channelInactive "ChannelInboundHandlerAdapter.java" 75]
  [io.netty.handler.timeout.IdleStateHandler channelInactive "IdleStateHandler.java" 277]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.AbstractChannelHandlerContext fireChannelInactive "AbstractChannelHandlerContext.java" 224]
  [io.netty.channel.DefaultChannelPipeline$HeadContext channelInactive "DefaultChannelPipeline.java" 1403]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 245]
  [io.netty.channel.AbstractChannelHandlerContext invokeChannelInactive "AbstractChannelHandlerContext.java" 231]
  [io.netty.channel.DefaultChannelPipeline fireChannelInactive "DefaultChannelPipeline.java" 912]
  [io.netty.channel.AbstractChannel$AbstractUnsafe$8 run "AbstractChannel.java" 826]
  [io.netty.util.concurrent.AbstractEventExecutor safeExecute "AbstractEventExecutor.java" 163]
  [io.netty.util.concurrent.SingleThreadEventExecutor runAllTasks "SingleThreadEventExecutor.java" 404]
  [io.netty.channel.nio.NioEventLoop run "NioEventLoop.java" 495]
  [io.netty.util.concurrent.SingleThreadEventExecutor$5 run "SingleThreadEventExecutor.java" 905]
  [java.lang.Thread run "Thread.java" 834]
zoewangg commented 5 years ago

@Eyal-Shalev Server failed to send complete response can happen when the service closes the connection mid request and this is valid per HTTP spec. Normally retry would work; is there any chance retry is disabled on ElasticLoadBalancingAsyncClient? If not, reducing connectionMaxIdleTime would help by closing idle connections more frequently. Are you still experiencing issue with s3 async client?

miere commented 5 years ago

Guys, I'm still experiencing the same issue as the other devs. In my case it affects my Kinesis producer which is ingesting a couple of thousands of messages per second.

It worth mention that I'm using connectionMaxIdleTime of 5 seconds, as suggested earlier on this thread.

I'm using JDK 11 with the following libraries:

properties:
  # VERSIONS
  version.kotlin: 1.3.40
  version.jackson: 2.9.9
  version.aws: 2.7.15
  version.aws-kinesis: 2.2.1

Here is the stack trace.

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1076)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:221)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:228)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:827)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
... 54 common frames omitted
ybhatnagar commented 5 years ago

We are also facing the same issue with the async client version 2.7.13. Our use case involves making a few calls to s3 to serve UI requests, and we wanted to use the async client. We are getting this error quite frequently, around the same rate as @dazito mentioned, and is becoming difficult for us to constantly monitor and restart the service. @zoewangg In case if there is not an immediate fix planned in an upcoming version, we might have to do try out s3 sync client, and in case it works, performance impact analysis. Do you have any suggestions on what should be the way forward?

zoewangg commented 5 years ago

@miere The suggested setting connectionMaxIdleTim = 5 seconds only applies to s3 client. This github issue tracks s3 specific issues, could you create a new issue for it (with repro case if possible)? We will look into it.

@ybhatnagar To be clear, are you having the issue with s3 sync client or async client? Have you tried to set connectionMaxIdleTime to 5 seconds? What APIs are you using? Is retry disabled? (retry is enabled by default).

Eyal-Shalev commented 5 years ago

@zoewangg It seems like our issue with S3 connection was indeed resolved.
We are implementing retry ourselves (with exponential backoff), and to mitigate the issue with registerInstancesWithLoadBalancer we had to set up to 10 retries (5 wasn't enough).

Eyal-Shalev commented 5 years ago

@zoewangg We haven't disabled the retry, is it enabled by default?

zoewangg commented 5 years ago

@Eyal-Shalev Glad to hear the issue was resolved for s3 client. Retry is enabled be default and the default max number of retries is 3.

zoewangg commented 5 years ago

Closing the issue due to no recent response. Feel free to open a new issue if you continue to see the error.

seanzxx commented 4 years ago

I'm facing this issue with version 2.5.61 and I find this discussion.
I have one question, do I need to set connectionMaxIdleTime(Duration.ofSeconds(5)) after I update the dependency to version 2.9.11?

zoewangg commented 4 years ago

We just merged the change to updated the default connectionMaxIdleTime for s3 client via #1450 and it will be included in the next release.

Before that, you'd still need to set connectionMaxIdleTime(Duration.ofSeconds(5))

jaylynstoesz commented 4 years ago

I'm seeing this issue in the latest version of the SDK. I'm reading from SQS via Alpakka using the async client. I tried setting the connectionMaxIdleTime to 5 seconds and I'm still getting the same error. Kotlin: 1.3.41 Netty & SQS Java SDK: 2.9.15

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
    at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
    at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)  
at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1078)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:827)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
    at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
    ... 52 more
Caused by: java.io.IOException: Server failed to send complete response
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
    ... 43 more
debora-ito commented 4 years ago

Hi @jaylynstoesz, I see you're using SQS so can you please open a new github issue, with a sample code we can use to repro? Setting connectionMaxIdleTime to 5 seconds is a suggestion tailored for S3 clients.

alejandrod commented 4 years ago

For SNS/SQS I have created this ticket https://github.com/aws/aws-sdk-java-v2/issues/1520 @jaylynstoesz

pkgonan commented 2 years ago

@millems @debora-ito @zoewangg

Hi. Reducing CONNECTION_MAX_IDLE_TIMEOUT to 5 seconds is not a workaround. I'm still having the following problem: Any solution?

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Channel was closed before it could be written to.

https://github.com/aws/aws-sdk-java-v2/blob/51d067e8f51d40a1e6214375ddf147ebed29c002/http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/NettyNioAsyncHttpClient.java#L80

[S3 Configuration] Please let me know if there are any settings that need to be changed because they are too low or too high.

@Configuration
@EnableConfigurationProperties(S3ClientConfig.S3ClientConfigurationProperties::class)
open class S3ClientConfig(
    private val properties: S3ClientConfigurationProperties
) {

    companion object {
        private const val HTTP_CLIENT_MAX_CONCURRENCY = 1000
        private const val MAX_PENDING_CONNECTION_ACQUIRE = 10000

        private val LOG = LoggerFactory.getLogger(this::class.java)
        private val CONNECTION_TIMEOUT = Duration.ofSeconds(2)
        private val CONNECTION_ACQUISITION_TIMEOUT = Duration.ofSeconds(2)
        private val CONNECTION_MAX_IDLE_TIME = Duration.ofSeconds(5)
        private val WRITE_TIMEOUT = Duration.ofSeconds(30)
        private val READ_TIMEOUT = Duration.ofSeconds(30)
    }

    @Bean
    open fun s3AsyncClient(): S3AsyncClient {
        val clientBuilder = S3AsyncClient.builder()
            .httpClient(getHttpClient())
            .serviceConfiguration(getServiceConfiguration())
            .region(getS3Region())
            .dualstackEnabled(true)
            .credentialsProvider(getS3CredentialProvider())
            .overrideConfiguration { it.retryPolicy(RetryPolicy.none()) }
            .asyncConfiguration { it.advancedOption(SdkAdvancedAsyncClientOption.FUTURE_COMPLETION_EXECUTOR, Dispatchers.IO.asExecutor()) }

        if (properties.getEndpoint() != null) {
            clientBuilder.endpointOverride(properties.getEndpoint())
        }

        return clientBuilder.build()
    }

    private fun getHttpClient(): SdkAsyncHttpClient {
        return NettyNioAsyncHttpClient.builder()
            .protocol(Protocol.HTTP1_1)
            .tcpKeepAlive(true)
            .writeTimeout(WRITE_TIMEOUT)
            .readTimeout(READ_TIMEOUT)
            .connectionTimeout(CONNECTION_TIMEOUT)
            .connectionAcquisitionTimeout(CONNECTION_ACQUISITION_TIMEOUT)
            .connectionMaxIdleTime(CONNECTION_MAX_IDLE_TIME)
            .maxConcurrency(HTTP_CLIENT_MAX_CONCURRENCY)
            .maxPendingConnectionAcquires(MAX_PENDING_CONNECTION_ACQUIRE)
            .useIdleConnectionReaper(true)
            .build()
    }

    private fun getServiceConfiguration(): S3Configuration {
        return S3Configuration.builder()
            .checksumValidationEnabled(false)
            .chunkedEncodingEnabled(true)
            .accelerateModeEnabled(false)
            .build()
    }

    private fun getS3Region(): Region {
        return Region.of(properties.getRegion())
    }

    private fun getS3CredentialProvider(): AwsCredentialsProvider {
        try {
            return DefaultCredentialsProvider.create()
        } catch (ex: Exception) {
            LOG.error("Invalid S3 Credential Configuration Properties")
            throw InvalidAWSCredentialException("Invalid AWS Credential detected")
        }
    }

    @ConstructorBinding
    @ConfigurationProperties("content.storage.aws.s3")
    @Validated
    class S3ClientConfigurationProperties(
        @field:NotBlank private val region: String,
        private val endpoint: URI?
    ) {

        internal fun getRegion() = region
        internal fun getEndpoint() = endpoint

        override fun toString(): String {
            return "S3ClientConfigurationProperties(" +
                    "region=${getRegion()}, " +
                    "endpoint=${getEndpoint()}, " +
                    ")"
        }
    }
}
Bennett-Lynch commented 2 years ago

Hi @pkgonan. For error messages that include Channel was closed before it could be written to specifically, it is likely related to a pending fix that is currently being worked on in https://github.com/aws/aws-sdk-java-v2/pull/2883 (and I believe this is unrelated to the issue that this thread was originally created for). Please follow that PR and let us know if it resolves your issue once it is released.