Open vtlkvl opened 5 years ago
Is it possible this causes a shardRecordProcessor not to finish shutdown but to continue executing ProcessTasks ?
Not sure, at least we haven't observed it. Based on my personal analysis of the code, it should not happen.
I think some shard consumer is stuck in a failure loop and holding up the graceful shutdown. See #616 for ideas to debug this further.
I believe that this is related https://github.com/awslabs/amazon-kinesis-client/pull/1302
When graceful shutdown is requested via
Scheduler.startGracefulShutdown
call, it often happens that all active leases get removed fromScheduler.shardInfoShardConsumerMap
before shutdown of record processors is complete andGracefulShutdownContext.shutdownCompleteLatch
gets down to 0. This leads to a problem inGracefulShutdownCallable.waitForRecordProcessors
:Under normal conditions shutdown complete latch should eventually count down to 0 and future returned by
Scheduler.startGracefulShutdown
should yieldtrue
. Because of a race condition, shutdown complete latch holds a non-zero value andGracefulShutdownCallable.workerShutdownWithRemaining
returnstrue
becauseScheduler.shardInfoShardConsumerMap
is already empty at this point while Scheduler has not finished shutdown process. As a result future returned byScheduler.startGracefulShutdown
yieldsfalse
. As a workaround to get notified about shutdown completion it is required to checkScheduler.shutdownComplete
in a loop until it returnstrue
.