Open lbourdages opened 4 years ago
Hi @lbourdages thank you for being diligent and reporting this to us. I understand the issue is happening out of nowhere, but can you answer some questions to help us isolate the case so we can try to reproduce it in our side?
Hi!
Here are a few minutes of logs. Note that I've scrubbed off logs from our business logic but they should not be having any impact on the operation of the Netty internals anyway.
The service had been running for about 25 hours at the time the log happened.
Several hours before, we had two instances of this:
Apr 30, 2020 @ 01:43:18.302 WARN software.amazon.awssdk.http.nio.netty.internal.ResponseHandler Subscriber cancelled before all events were published Apr 29, 2020 @ 22:47:53.531 WARN software.amazon.awssdk.http.nio.netty.internal.ResponseHandler Subscriber cancelled before all events were published
That's all I can find.
We just experienced this too on version 2.3.3 of the amazon-kinesis-client after upgrading from 2.2.5 We're also seeing fairly frequent logs saying "An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception." with the following stacktrace:
software.amazon.awssdk.http.nio.netty.internal.http2.Http2ConnectionTerminatingException: An I/O error occurred on an associated Http2 stream [id: 0x16f5249b, L:/10.0.2.231:44260 ! R:kinesis.eu-west-1.amazonaws.com/99.80.34.137:443](H2 - 8233) at software.amazon.awssdk.http.nio.netty.internal.http2.Http2StreamExceptionHandler.exceptionCaught(Http2StreamExceptionHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) at software.amazon.awssdk.http.nio.netty.internal.UnusedChannelExceptionHandler.exceptionCaught(UnusedChannelExceptionHandler.java:52) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel.windowUpdateFrameWriteComplete(AbstractHttp2StreamChannel.java:118) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel.access$000(AbstractHttp2StreamChannel.java:57) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe.updateLocalWindowIfNeeded(AbstractHttp2StreamChannel.java:850) at io.netty.handler.codec.http2.AbstractHttp2StreamChannel$Http2ChannelUnsafe.beginRead(AbstractHttp2StreamChannel.java:780) at io.netty.channel.DefaultChannelPipeline$HeadContext.read(DefaultChannelPipeline.java:1362) at io.netty.channel.AbstractChannelHandlerContext.invokeRead(AbstractChannelHandlerContext.java:686) at io.netty.channel.AbstractChannelHandlerContext.read(AbstractChannelHandlerContext.java:671) at software.amazon.awssdk.http.nio.netty.internal.http2.FlushOnReadHandler.read(FlushOnReadHandler.java:40) at io.netty.channel.AbstractChannelHandlerContext.invokeRead(AbstractChannelHandlerContext.java:686) at io.netty.channel.AbstractChannelHandlerContext.read(AbstractChannelHandlerContext.java:671) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.requestDemand(HandlerPublisher.java:130) at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler$1.requestDemand(HttpStreamsHandler.java:191) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.flushBuffer(HandlerPublisher.java:345) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.receivedDemand(HandlerPublisher.java:291) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.access$200(HandlerPublisher.java:61) at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:495) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Thread.java:832)
This happend 4 times the last day. They don't seem to occur at the same time as the other error though.
We're going to roll back to the old version now. Would appreciate any update on this.
@Bas83 thank you for reporting.
Regarding the Http2ConnectionTerminatingException
there's an open issue in the amazon-kinesis-client
repository, I suggest you upvote there too: https://github.com/awslabs/amazon-kinesis-client/issues/755
I'm facing the same issue when writing a stream of records (batch API) to Kinesis using KinesisAsyncClient (v 2.17.106/127).
Scenario is as follows: The stream is processed by x
threads in parallel (Apache Spark tasks), each thread triggers up to y
concurrent requests. All threads share one instance of the async client. Currently I'm still testing locally, not even on a cluster.
Setting particularly X, but also Y too high will repeatedly fail the stream within 1 minute. Even setting numRetries unreasonably high does not prevent the issue.
Initially I'm seeing this periodically:
22/02/11 18:28:48 WARN software.amazon.awssdk.http.nio.netty.internal.http2.MultiplexedChannelRecord: [Channel: 11fa7015] Child channel count was caught attempting to be increased over max concurrency. Please report this issue to the AWS SDK for Java team.
soon after followed by:
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The connection was closed during the request. The request will usually succeed on a retry, but if it does not: consider disabling any proxies you have configured, enabling debug logging, or performing a TCP dump to identify the root cause. If this is a streaming operation, validate that data is being read or written in a timely manner. Channel Information: ChannelDiagnostics(channel=[id: 0x551a2dfd, L:/192.168.178.20:50484 - R:kinesis.eu-central-1.amazonaws.com/3.123.12.193:443](H2 - 7), channelAge=PT0.001S, requestCount=1) Parent Channel Information: ChannelDiagnostics(channel=[id: 0x551a2dfd, L:/192.168.178.20:50484 - R:kinesis.eu-central-1.amazonaws.com/3.123.12.193:443], channelAge=PT10.166S, requestCount=0)
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
An interesting observation is that for x<=2
even a high y
works well.
A higher x
, on the other hand, quickly leads to the issue above if y>=2
.
@debora-ito I tried looking into the issue. The strange thing is, when using a custom build of netty-nio-client
(tag 2.17.127, openjdk 1.8.0_322) everything works fine. As soon as I use the jar published to maven I'm reliably running into the issue above. But I don't see any difference except for some build metadata
@mosche the custom build of netty-nio-client
is written by you/your team? How are you checking the differences?
@debora-ito All I did was a mvn clean install -pl :netty-nio-client -P quick --am
on the original code, though turns out i didn't pay close enough attention to transitive dependencies.
Digging further this turns out to be a classpath issue with different transitive versions of netty. In my case Spark 2.4.7 uses
io.netty:netty-all:4.1.47.Final
, which is a fat jar containing all netty io classes of all submodules. netty-all:4.1.72.Final
has become an uber jar with dependencies. It looks like this causes incompatible classes to be loaded at runtime, causing the issue.
In my case forcing any consistent version of netty solves the issue.
io.netty:netty:3.9.9.Final
-io.netty:netty-all:4.1.47.Final
+io.netty:netty-all:4.1.72.Final
io.netty:netty-buffer:4.1.72.Final
io.netty:netty-codec:4.1.72.Final
+io.netty:netty-codec-dns:4.1.72.Final
+io.netty:netty-codec-haproxy:4.1.72.Final
io.netty:netty-codec-http:4.1.72.Final
io.netty:netty-codec-http2:4.1.72.Final
+io.netty:netty-codec-memcache:4.1.72.Final
+io.netty:netty-codec-mqtt:4.1.72.Final
+io.netty:netty-codec-redis:4.1.72.Final
+io.netty:netty-codec-smtp:4.1.72.Final
+io.netty:netty-codec-socks:4.1.72.Final
+io.netty:netty-codec-stomp:4.1.72.Final
+io.netty:netty-codec-xml:4.1.72.Final
io.netty:netty-common:4.1.72.Final
io.netty:netty-handler:4.1.72.Final
+io.netty:netty-handler-proxy:4.1.72.Final
io.netty:netty-resolver:4.1.72.Final
+io.netty:netty-resolver-dns:4.1.72.Final
+io.netty:netty-resolver-dns-classes-macos:4.1.72.Final
+io.netty:netty-resolver-dns-native-macos:4.1.72.Final
io.netty:netty-tcnative-classes:2.0.46.Final
io.netty:netty-transport:4.1.72.Final
io.netty:netty-transport-classes-epoll:4.1.72.Final
+io.netty:netty-transport-classes-kqueue:4.1.72.Final
+io.netty:netty-transport-native-epoll:4.1.72.Final
+io.netty:netty-transport-native-kqueue:4.1.72.Final
io.netty:netty-transport-native-unix-common:4.1.72.Final
+io.netty:netty-transport-rxtx:4.1.72.Final
+io.netty:netty-transport-sctp:4.1.72.Final
+io.netty:netty-transport-udt:4.1.72.Final
Looks like 4.1.69.Final is the first correct uber jar that doesn't contain classes itself ...
@debora-ito Also wondering, is com.typesafe.netty » netty-reactive-streams-http an intentional dependency of netty-nio-client
?
@lbourdages @Bas83 we want to investigate this error further, but we need a sample code so we can reproduce in our side.
Do you have self-contained repro code that reliably show the error, using the latest version of the SDK?
@lbourdages @Bas83 we want to investigate this error further, but we need a sample code so we can reproduce in our side.
Do you have self-contained repro code that reliably show the error, using the latest version of the SDK?
I'm afraid we don't, it's been a while and we've been running on version 2.4.3 for a while now. The only problem with that version that we have (or rather with the 2.4.1 one before that, but we haven't tried again) is that updating to Java 17 (from 16) miraculously makes kinesis skip part of the records while consuming.
I found a log at WARN level that told me to contact the Java SDK team, so here's the bug report.
Describe the bug
We run a Kinesis Client Library application. During what appears to be normal operation, a log printed out telling me to contact the Java SDK team.
Expected Behavior
N/A
Current Behavior
Here's the log in its entirety:
The logger name is "software.amazon.awssdk.http.nio.netty.internal.http2.MultiplexedChannelRecord"
Steps to Reproduce
It happened out of nowhere, I don't know
Possible Solution
N/A
Context
KCL github: https://github.com/awslabs/amazon-kinesis-client
Your Environment