oripwk commented 5 years ago

This issue is exactly as #452, but instead of S3 it happens in CloudWatch.

Expected Behavior

putMetricData(…) should not throw an exception.

Current Behavior

java.io.IOException: Server failed to send complete response is thrown several times a minute. Most of the requests seem to succeed.

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
    at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:397)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:180)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:397)
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:148)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
    at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390)
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355)
    at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
    at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
    at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
    ... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
    at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
    ... 54 common frames omitted

Possible Solution

I have no idea.

Steps to Reproduce (for bugs)

Create a CloudWatch client that sends putMetricData requests about 100 times a minute, where each request has 20 metrics, and each metric contains 150 values.

Context

Monitoring real-time streaming application.

Your Environment

AWS Java SDK version used: 2.7.15
JDK version used: 1.8
Operating System and version: Linux 4.14.62-65.117.amzn1.x86_64 x86_64 GNU/Linux

sschepens commented 5 years ago

observed the same using getMetricData

varunnvs92 commented 5 years ago

1) How often are you seeing this issue? 2) Do you have test code to consistently repo the issue?

As discussed in #452, we made some fixes which reduced this error occurrence a lot. But there is still small chance of seeing this error. Code to repo will help us debug the issue further. I would also suggest you to try with the latest version and see if there is any improvement.

slingala-scout24 commented 5 years ago

We have the same issue as well. We upgraded the aws sdk to the latest version 2.7.32 in the hope that the issue will be fixed, but unfortunately it didn't help. This issue happens when the application runs for a long time without a restart. But as soon as there is an application restart, we don't see this issue anymore until the next prolonged run. It usually happens after about 24 hours of the application start. Unfortunately I do not have a reproducible code, but we just use the default metrics publishing configuration.

oripwk commented 5 years ago

@varunnvs92

As I said, this issue happens regularly, almost once a second. It never stops throwing that exception.
The issue happens during production workload. In order to write code that reproduces this, I'll have to artificially create the same conditions. It might take a while. I'll keep you posted.

I think that because it's reproducing for other people and it's very similar to https://github.com/aws/aws-sdk-java-v2/issues/452, it should deserve more attention.

trautonen commented 5 years ago

We've found that only solution for now to this issue and #452 is to use the sync clients which use apache http client instead of netty.

zoewangg commented 5 years ago

Apologies for the delayed response. Unfortunately, I'm not able to reproduce the issue. Can you provide more details of how to reproduce the issue? For example, the apis you are using, and the number of requests per second

I can see Server failed to send complete response errors with retry disabled because it's possible that the connection gets closed remotely mid request, and normally the next retry should pick up a new connection and succeeds. Is there any chance you disabled retry on the SDK client? Can you provide code snippet of how you create the client?

trautonen commented 5 years ago

We are using with default config, default connection parameters.

CloudWatchAsyncClient.builder().build()

And it's not single or random requests that fail. Like @oripwk said, it's constant failure, all requests fail. We are not triggering it very heavily either. Burst of like 5 reqs in sec once in a minute. And it's surely related to the HTTP client as there are zero problems with Apache HTTP client.

zoewangg commented 5 years ago

@trautonen Thank you for sharing the info. Unfortunately, still no luck reproducing the issue.

I tried sending 5 concurrent requests every minute for a total of 20 minutes and all requests succeeded; I did notice that some requests failed at the first attempt but succeeded in the next retry.

Is there anyway you can provide debug logs? Just wanted to check if all retry attempts failed of the same reason in your use case.

        StabilityTestRunner.newRunner()
                           .testName("CloudWatchAsyncStabilityTest." + testName)
                           .futureFactory(cloudWatchAsyncClient.putMetricData(b -> b.namespace(namespace)
                                                                                    .metricData(metrics)))
                           .totalRuns(20)
                           .requestCountPerRun(5)
                           .delaysBetweenEachRun(Duration.ofMinutes(1))
                           .run();

Could you try setting connectionMaxIdleTimeout to 5 seconds? In my tests, reducing the max idle timeout largely reduced the chances of the idle connection getting closed mid request on the service side.

        CloudWatchAsyncClient.builder()
                             .httpClientBuilder(NettyNioAsyncHttpClient.builder()
                                                                       .connectionMaxIdleTime(Duration.ofSeconds(5)))
                            .build();

zoewangg commented 5 years ago

We have pushed out the change to update the default connectionMaxIdleTime for cloudwatch client via #1450. The change has been released as part of 2.9.14.

sycyhy commented 5 years ago

I got similar issue (exactly the same exception) and I wonder if it's also gonna be fixed with this.

I create a Cloudwatch client like this

    private static final AwsCredentialsProvider CREDENTIALS_PROVIDER = AwsCredentialsProviderChain.of(
            ProfileCredentialsProvider.create(),
            EnvironmentVariableCredentialsProvider.create(),
            ContainerCredentialsProvider.builder().asyncCredentialUpdateEnabled(true).build(),
            InstanceProfileCredentialsProvider.create()
    );

    private static CloudWatchAsyncClient initializeCloudWatchClient() {
        return CloudWatchAsyncClient.builder()
                .credentialsProvider(CREDENTIALS_PROVIDER)
                .build();
    }

My metrics are send properly for a few days and then suddenly in the middle of the night most of the requests will start failing with the same exception as the author mentioned. The traffic levels are actually lower during the night so it seems to be a bit different scenario.

I will try the new version either way, but if you see anything obvious that I'm missing let me know.

zoewangg commented 5 years ago

Hi @sycyhy, have you tried the new version? Are you still experiencing the issue?

sycyhy commented 5 years ago

Hi @zoewangg - yep it's on Production right now, but I will be able to confirm after a weekend because during the week there is number of deployments and the issue pops up when you leave the app without any restarts for a few days :)

kaffepanna commented 5 years ago

Seeing the same issue when using with KCL under no particular load

zoewangg commented 5 years ago

@kaffepanna Are you still experiencing the issue with the latest version?

kaffepanna commented 5 years ago

@zoewangg using version 2.9.20

sycyhy commented 5 years ago

Yep, my issue is still present - updating the library to 2.9.14 sadly did not help

zoewangg commented 5 years ago

@sycyhy Thank you for reporting back. Looks like the issue seems to occur after letting the app running for a while. Did it ever recover after the errors or all subsequent requests failed?

I'll continue investigating.

sycyhy commented 5 years ago

@zoewangg nope it does not recover, the app needs to be restarted. I also just noticed that the IOException slightly changed with the new version ->

Caused by: s.a.a.c.e.SdkClientException: null
at s.a.a.c.e.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at s.a.a.c.i.u.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at s.a.a.c.i.h.p.s.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
... 35 common frames omitted
Caused by: java.io.IOException: Channel was closed before it could be written to.
... 28 common frames omitted"

zoewangg commented 5 years ago

Were all errors "IOException: Channel was closed before it could be written to" or just some of them?

They were caused by the same reason - the server has closed the connection remotely in the middle of the request, but for this error, it seems the connection has been closed before the SDK was able to write the request.

sycyhy commented 5 years ago

yep you are right, only some of them (actually only 2 of those for hundreds of the Server failed to send complete response)

zoewangg commented 5 years ago

Still not able to reproduce the exact all requests failure issue, but found out one issue in the SDK. There are certain cases when conneciton=close is present in the response header, the SDK is not closing the connection as expected and will try to reuse that connection for other requests. Thus if the next request picks up that bad connection right before service closes the connection, IOException would be thrown once the connection is closed remotely.

zoewangg commented 5 years ago

1476 has been merged to address the issue with unresusable connections getting reused and the change will be included in the next release.

robinspollak commented 5 years ago

@sycyhy, thanks for raising this we're seeing a very similar issue and thanks to all who are working to resolve it. Lukasz, I'm wondering whether you are seeing any associated CPU spikes when this problem arises?

zoewangg commented 5 years ago

Hi @sycyhy @robinspollak @kaffepanna,

The fix has been released as part of 2.9.24. Could you try with the latest version and see if the issue is resolved for your use case? I was not able to reproduce the exact issues you were seeing, so it would be great if you could help verify the fix.

robinspollak commented 5 years ago

@zoewangg, 2.9.24 is in production now and I'll report back! Thanks very much

robinspollak commented 5 years ago

@zoewangg, unfortunately we're still seeing the following error:

java.lang.RuntimeException: software.amazon.awssdk.core.exception.SdkClientException
    at
software.amazon.kinesis.retrieval.AWSExceptionManager.apply(AWSExceptionManager.java:65)    
at software.amazon.kinesis.metrics.CloudWatchMetricsPublisher.blockingExecute(CloudWatchMetricsPublisher.java:89)   
at software.amazon.kinesis.metrics.CloudWatchMetricsPublisher.publishMetrics(CloudWatchMetricsPublisher.java:74)    
at software.amazon.kinesis.metrics.CloudWatchPublisherRunnable.runOnce(CloudWatchPublisherRunnable.java:138)    
at software.amazon.kinesis.metrics.CloudWatchPublisherRunnable.run(CloudWatchPublisherRunnable.java:84)     
at java.lang.Thread.run(Thread.java:748) Caused by: software.amazon.awssdk.core.exception.SdkClientException: null  
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)   
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)  
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)  
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)   
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)   
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)   at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)  
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)    
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)  
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.lambda$notifyIfResponseNotCompleted$2(ResponseHandler.java:391)   
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:174)  
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:391)    
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelInactive(ResponseHandler.java:142)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)     
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)     
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
--
.
.
.

It is the same as the one we were seeing before the new release, let me know if a full stack trace or a sample of how we are setting up our kinesis client would help.

zoewangg commented 5 years ago

@robinspollak Thanks for reporting back. Yes, it'd be helpful if you could share the sample code of how you set up the cloudwatch client. Did all requests fail or just some requests failed? Are you running it on an EC2 instance? If so, could you share what type of that instance is?

robinspollak commented 5 years ago

@zoewangg We set up our (scala) KCL integration as follows:

class KinesisConfigurationProvider @Inject()(
    configuration: Configuration,
    recordProcessorFactory: ShardRecordProcessorFactory
) {

  def apply(streamName: String): ConfigsBuilder = {
    val workerId = InetAddress.getLocalHost.getCanonicalHostName + ":" + UUID
      .randomUUID()
    val kinesisAsyncClient = KinesisAsyncClient.builder().region(Region.EU_WEST_1).build()
    val dynamoDBClient = DynamoDbAsyncClient.builder().region(Region.EU_WEST_1).build()
    val cloudWatchClient = CloudWatchAsyncClient.builder().region(Region.EU_WEST_1).build()
    new ConfigsBuilder(
      streamName,
      s"app-$streamName-processor",
      kinesisAsyncClient,
      dynamoDBClient,
      cloudWatchClient,
      workerId,
      recordProcessorFactory
    )
  }
}

It seems that only some requests fail. Upon deploy or reboot we won't see the log lines for about 3 hours. They then escalate in frequency for around 24 hours and, eventually, the server stops responding and needs to be rebooted. Our servers run on an m4.xlarge EC2 instance. Hope this helps and I'm happy to provide more information

zoewangg commented 5 years ago

@robinspollak Thanks for the info. A few follow up questions: did you see any other non cloudwatch errors before the server stopped responding? When the server stopped responding, did all request fail of "Server failed to send complete response " or it just got stuck somehow? What KCL version are you using?

@sycyhy are you also using KCL or just using cloudwatch async client?

robinspollak commented 5 years ago

Hi @zoewangg, sorry for the delayed response. We don't really see any other non-cloudwatch errors but we're also trying our best to determine with certainty that cloudwatch is causing the problems. Thus far, we haven't been able to rule out another source of the problem.

In terms of the server behavior when it stops responding... the entire server just ceases to respond to any requests and there are no responses to any requests and thus no errors in response to requests. We're using KCL 2.2.4.

Thanks again for all your help.

sycyhy commented 5 years ago

@zoewangg I use it via micrometer and not KCL which is a layer of abstraction of the async client -> which I passed manually when I configure the reporter.

btw -> I needed to revert to the legacy client due this (I could not wake up more people during the night sadly haha).

zoewangg commented 5 years ago

Hi @robinspollak, for the server not responding issue, my guess is that it might not be related to the cloudwatch errors. I'd recommend taking a thread dump or heap dump to see if there's a blocking thread or increased resource usage causing the server to stop responding.

We have pushed out a couple of fixes #1481 and #1479 (available in 2.10.0) that could cause the server failed to complete the response errors. KCL has also released a new version 2.2.5. Could you try the new versions?

@sycyhy Sorry to hear that you had to revert the client code 😞 We are currently working on setting up more tests scenarios that cover more real-world use cases to catch the issues.

zoewangg commented 4 years ago

Closing the issue due to lack of activity. Please create a new issue if you continue to see this error with the latest SDK.

cgallom commented 7 months ago

@oripwk Hi, I have read this post and others like it and I notice that you never could check if the updates on aws-sdk-java-v2 worked to fix your bug. I am currently using version 2.10.41 and the error persists in this version. Did you find out how to fix this bug ?

aws / aws-sdk-java-v2

CloudWatch putMetricData sometimes produce a java.io.IOException: Server failed to send complete response #1380

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

1476 has been merged to address the issue with unresusable connections getting reused and the change will be included in the next release.