awslabs / amazon-kinesis-client

Client library for Amazon Kinesis
Apache License 2.0
644 stars 467 forks source link

Kinesis Client Library Checkpoint exceptions #732

Open SuprithaMundaragi opened 4 years ago

SuprithaMundaragi commented 4 years ago

We have an implementation of dynamo db streams on a DynamoDB table. But often, especially on node restart, we have the exceptions like this and they do not stop:

com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Unable to fetch checkpoint for shardId shardId-00000001596757965421-84b48e8b at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator.getCheckpointObject(KinesisClientLibLeaseCoordinator.java:286) ~[amazon-kinesis-client-1.13.3.jar!/:?] at com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitializeTask.call(InitializeTask.java:82) [amazon-kinesis-client-1.13.3.jar!/:?] at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49) [amazon-kinesis-client-1.13.3.jar!/:?] at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24) [amazon-kinesis-client-1.13.3.jar!/:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?]

It cannot find checkpoints for shards. We have only 1 worker per instance currently and for the stream start position we have LATEST. And maybe sometimes instance restart will help, but this defeats the auto scaling policy we have for the micro service. Is there some resolution for this?

anekar3416 commented 2 years ago

@SuprithaMundaragi Did you find any solution for this?

neli-kh commented 1 year ago

Any updates to be had - having same issues on and off between deployments.

Gupastha commented 1 year ago

Same issue here, redeploying the application fixes the issue but not something we would like to rely on.

tinder-raipankaj commented 3 months ago

Hey folks, just checking in—has this issue earned its retirement yet? My KCL is still throwing tantrums (esp. during redeploys) over fetching checkpoints, and it's getting harder to calm it down. Any updates or magic tricks to finally put this one to bed?