aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.16k stars 835 forks source link

Rapid S3 downloads causes failure #1122

Closed oliverdain closed 5 years ago

oliverdain commented 5 years ago

I have code using the Java SDK to download a few hundred photos from an S3 bucket. The first bunch work but things start to fail around photo number 100. It does not always fail on the same file. It seems very similar to #452 though that issue is closed. Note that I'm using S3AsyncClient but downloading the file sequentially, never downloading more than one concurrently.

The photos are about 500KB each.

Expected Behavior

I should be able to download all the objects in my bucket

Current Behavior

Am able to download many files but after about 100 requests star to fail. The exception is:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
    at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:50)
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:167)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:118)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:103)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:241)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$3(ResponseHandler.java:398)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:179)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:398)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
    at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
    at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:117)
    ... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
    ... 54 common frames omitted

With full debug logs enabled there's a ton of output but the response from the Netty network logs from right before things crash is:

11:16:18.311 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 - R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] READ: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 254, cap: 254/254, unwrapped: PooledUnsafeDirectByteBuf(ridx: 8446, widx: 8446, cap: 8475)), decoderResult: success), 254B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 5d 52 e1 13 91 3f b5 10 51 7b 34 f6 79 a0 39 aa |]R...?..Q{4.y.9.|
|00000010| 3a 4d fc ce 3c 68 3b 15 b5 ad 02 63 d4 f7 a0 78 |:M..<h;....c...x|
|00000020| 92 29 39 d5 39 01 a9 44 b9 c6 33 2a 8d 7e 91 f3 |.)9.9..D..3*.~..|
|00000030| 28 ea 71 58 a6 09 25 58 2f a8 4e 94 f1 24 02 65 |(.qX..%X/.N..$.e|
|00000040| a7 b5 48 11 7c 6a 95 05 d2 a8 b0 93 a1 02 d8 57 |..H.|j.........W|
|00000050| 11 04 21 ad 32 b4 2d 94 16 99 af 0e f4 00 41 2a |..!.2.-.......A*|
|00000060| 80 04 55 04 32 94 41 38 93 40 01 11 25 02 94 89 |..U.2.A8.@..%...|
|00000070| 1a 39 2a 2d 84 b0 04 12 d7 2a cd 22 60 a7 34 27 |.9*-.....*."`.4'|
|00000080| 40 94 0f 0d 0d 53 0d 85 38 5d bb 24 ea af 94 32 |@....S..8].$...2|
|00000090| 84 80 24 ad 3a 63 0c e8 f8 77 85 44 8a a0 14 1a |..$.:c...w.D....|
|000000a0| 02 74 27 51 ab e6 25 02 c6 4e a8 ed 4e 64 68 4a |.t'Q..%..N..NdhJ|
|000000b0| 4e 84 09 cf 8c 7a 0e f4 65 4c f8 2b 00 28 14 f1 |N....z..eL.+.(..|
|000000c0| 98 f0 54 0a a2 a0 24 9b 48 42 2e ed 20 ad ae 53 |..T...$.HB.. ..S|
|000000d0| a8 f0 12 a9 80 9a 05 11 a3 53 2a 72 4f 3d 70 ae |.........S*rO=p.|
|000000e0| 07 9a 61 20 27 e7 8c 27 27 2d 6a a3 ff d9 f2 d3 |..a '..''-j.....|
|000000f0| 22 9e 33 37 d2 2e 06 03 e0 06 a6 fc 72 02       |".37........r.  |
+--------+-------------------------------------------------+----------------+
11:16:18.312 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] READ COMPLETE
11:16:18.312 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] INACTIVE
11:16:18.316 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] CLOSE
11:16:18.316 [aws-java-sdk-NettyEventLoop-1-3] DEBUG i.n.handler.logging.LoggingHandler - [id: 0xffce7082, L:/192.168.0.165:59528 ! R:com-revl-revlx.s3.us-west-2.amazonaws.com/52.218.224.248:443] UNREGISTERED

Steps to Reproduce (for bugs)

Here is a slightly modified version of the Kotlin code that produces the above (modified to remove company sensitive stuff):

object Main {
    val log = getLogger { }

    val s3AsyncClient = S3AsyncClient.builder()
        .credentialsProvider(DefaultCredentialsProvider.create())
        .region(Region.US_WEST_2)
        // retries disable here because we can't retry in our situation - the real use case is to stream
        // data from a server to a customer. Note that you get the same failures even with retries -- it
        // just takes longer
        .overrideConfiguration { config ->
            config.retryPolicy(RetryPolicy.none()).build()
        }
        .build()

    val daBucket = "com-revl-revlx"

    fun doDownload(key: String) {
        val outFile = Paths.get("/tmp/experiment/").resolve(key)
        outFile.parent.toFile().mkdirs()
        if (outFile.toFile().exists()) {
            outFile.toFile().delete()
        }
        log.info("Downloading via toFile: {}", key)
        val request = GetObjectRequest.builder()
            .bucket(daBucket)
            .key(key)
            .build()

        val future = s3AsyncClient.getObject(request, AsyncResponseTransformer.toFile(outFile))

        try {
            future.join()
            log.info("Download of {} succeeded", key)
        } catch (t: Throwable) {
            log.error("Download of {} failed with:", key, t)
            throw t
        }
    }

    @JvmStatic
    fun main(argv: Array<String>) {
        ThreadUtils.setLoggingDefaultExceptionHandler()
        val keys = (1..240).toList()

        var numDone = 1
        for (i in keys) {
            log.info("Process file number {}", numDone)
            ++numDone
            val key = String.format("path/to/photo_%05d.jpg", i)
            doDownload(key)
        }
    }
}

Context

We provide photos to our customers and are trying to provide a "download all" function. To do this we need to stream the data from S3 through some code that incrementally adds it to a zip file which is provided to the customer. Since there's so much data we can't hold it all in RAM so we need to do this async-streaming processing. We have our own AsyncResponseTransformer which does this but wanted to provide an example with as little of our code as possible. AsyncResponseTransformer.toFile and our code experience the same error.

Your Environment