Closed rwightman closed 4 years ago
Looks like a duplicate of: https://github.com/awslabs/amazon-kinesis-client/issues/55
@klesniewski yes, I noticed that issue. In my situation specifically though, I wasn't having the warning issue with any sort of frequency until after the problems you noted with the streams getting stuck were fixed. Suggesting that perpaps the sequence of events that was causing the stuck shards is fairly reliably putting things in a state where the warnings shardId warnings are triggered...
From what I understood from awslabs/amazon-kinesis-client#55, the issue was introduced with KinesisProxy - the same one that was used to resolve #20. The problem seems to be caused by KinesisProxy keeping cached list of shards and not refreshing those on lease steal.
I think I have the same issue, although we also see non-stop ERROR
level spam like:
ERROR [2020-02-27 13:02:45,382] [RecordProcessor-2873] c.a.s.k.c.lib.worker.InitializeTask: Caught exception:
com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Unable to fetch checkpoint for shardId shardId-00000001582460850801-53f6f94b
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator.getCheckpointObject(KinesisClientLibLeaseCoordinator.java:286)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitializeTask.call(InitializeTask.java:82)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I was having issues as per #20 for the longest time. Finally with the latest updates to this library and KCL 1.9.2 I don't seem to be having stuck streams.
However, I am constantly seeing these logs spamming at a warning level:
2018-10-23 17:41:56 WARN c.a.s.d.s.DynamoDBStreamsProxy - Cannot find the shard given the shardId shardId-xxxxx
2018-10-23 17:41:56 WARN c.a.s.k.c.lib.worker.ProcessTask - Cannot get the shard for this ProcessTask, so duplicate KPL user records in the event of resharding will not be dropped during deaggregation of Amazon Kinesis records.
I've looked up some issues in the KCL library but nothing providing a solid answer to my situation there. It seems somehow related to the way the DynamoDB streams proxy works as I wasn't seeing this log spam on the KCL side until I updated to the latest version of this code and started using the new construction method.
I've check the leases in DynamoDB I've often seen only one lease entry for the shard being complained about. It's a very simple setup right now, based on the sample code. Two worker threads in a single process right now processing the streams and usually only one shard.
I've seen: