awslabs / amazon-kinesis-client

Client library for Amazon Kinesis
Apache License 2.0
644 stars 467 forks source link

Error KCL core.exception.SdkClientException: PrefetchRecordsPublisher #717

Open kheraankit opened 4 years ago

kheraankit commented 4 years ago

Getting a lot of these errors randomly on some hosts

'amazon-kinesis-client', version: '2.2.7'

2020-05-07 19:51:45 ERROR s.a.k.r.p.PrefetchRecordsPublisher:454 - 0 0 - shardId-000000000109 : Exception thrown while fetching records fr om Kinesis software.amazon.awssdk.core.exception.SdkClientException: The channel was closed. This may have been done by the client (e.g. because the request was aborted), by the service (e.g. because the request took too long or the client tried to write on a read-only socket), or by an intermediary party (e.g. because the channel was idle for too long). at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97) at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:108) at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor$$Lambda$826.00000000D43D0F30.accept(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$WrappedErrorForwardingResponseHandler.onError(MakeAsyncHttpRequestStage.java:124) at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:267) at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:141) at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$$Lambda$821.00000000D43CF090.operationComplete(Unknown Source) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) at io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35) at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:501) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Unknown Source)

ashwing commented 4 years ago

Hi, these exceptions are thrown from AWS SDKs connection management layer. How frequent are these? And does this affect your data consumption or propagation delay of your records?

kheraankit commented 4 years ago

@ashwing This seems to happen sporadically on pods. While troubleshooting we observed logs for the pod were filled up with these errors. It doesn't seem like it is able to reconnect after it gets into this state? The way to get rid of these errors for now is to recycle the pod :(.

kheraankit commented 4 years ago

Has anyone faced this issue before, any recommendations?

kheraankit commented 4 years ago

Can someone help take a look?

Jijii commented 4 years ago

I have the exact same problem. This happens sporadically, recently 4 days apart. Also using version 2.2.7.

priyath commented 3 years ago

I have observed this on KCL 2.3 as well. Any update on this?

matsev commented 3 years ago

Any updates regarding this issue?

I have copied the code from the amazon-kinesis-learning repo and it fails in a similar way when attempting to validate the stream:

    private static void validateStream(KinesisAsyncClient kinesisClient, String streamName) {
        try {
            DescribeStreamRequest describeStreamRequest =  DescribeStreamRequest.builder().streamName(streamName).build();
            DescribeStreamResponse describeStreamResponse = kinesisClient.describeStream(describeStreamRequest).get();
            if(!describeStreamResponse.streamDescription().streamStatus().toString().equals("ACTIVE")) {
                System.err.println("Stream " + streamName + " is not active. Please wait a few moments and try again.");
                System.exit(1);
            }
        }catch (Exception e) {
            System.err.println("Error found while describing the stream " + streamName);
            System.err.println(e);
            System.exit(1);
        }
    }

I instantiate the KinesisAsyncClient in a similar way as the example code

KinesisAsyncClient kinesisClient = KinesisClientUtil.createKinesisAsyncClient(KinesisAsyncClient.builder().region(region));

Dependency: software.amazon.kinesis:amazon-kinesis-client:2.3.3


Update:

It turned out that it was another decency that caused this error. Upgrading compile 'io.netty:netty-tcnative-boringssl-static:2.0.26.Final' to compile 'io.netty:netty-tcnative-boringssl-static:2.0.36.Final' resolved my problem.