awslabs / dynamodb-streams-kinesis-adapter

The Amazon DynamoDB Streams Adapter implements the Amazon Kinesis interface so that your application can use KCL to consume and process data from a DynamoDB stream.
Apache License 2.0
97 stars 37 forks source link

Worker goes idle forever #20

Closed klesniewski closed 5 years ago

klesniewski commented 5 years ago

In one of our applications, we have observed that DynamoDB Streams processing sometimes stops until application is restarted. The first time it happened it caused quite a headache, as we discovered it more than 24 hours later (some data was no longer available in the stream). Now, with monitoring in place, we can see it happens every few days (happened 4 times so far). We have observed the following:

Checking KCL library implementation, we have noticed that LeaseTaker will take new leases only if these are available in the lease table. Discovering and inserting new leases to lease table happens only on 2 occasions: on worker initialization and on reaching shard end. We suspect that sometimes when shard end is reached and shards are listed, information about new shards is not yet available. Because of that, no new shards are inserted into lease table and so LeaseTaker will not see the new shards. As no shard is being consumed, no shard end is reached, no shards are ever inserted to lease table, and so the worker stays idle forever. Given there is more than one worker instance, the problem is probably less visible, since shards will be synced again when another worker finishes its shard, unlocking the idle worker. Nevertheless, there will be a period where worker is idle because shards are not in sync in lease table.

I am not sure, whether this issue belongs to KCL library or the DynamoDB Adapter. It seems KCL is working under assumption, that information about new shards is always available before shard end is reached. I don't know whether this assumption is intentional and violated by the Adapter, or whether the assumption is wrong and has to be fixed in KCL. Therefore I created this issue in both projects. The same issue in the other project: https://github.com/awslabs/amazon-kinesis-client/issues/442

Libraries used:

parijatsinha commented 5 years ago

We are aware of this issue (leases not created due to delay in shards appearing in Streams metadata) and released a fix in v1.4.0. Are you initializing your worker using the recommended factory method mentioned in the Readme?

klesniewski commented 5 years ago

Thank you for your fast response! Great to know you already have a fix for it. We are not using the factory mentioned in the readme, but I will give it a try now. If I understand correctly, when Proxy is used, it will detect case when some new shards are not returned, and will try a few more times before returning, so that the new shards are returned. Is it more or less correct?

Could you please update the documentation? I was following the Walkthough there, but it does not use the added and recommended worker factory. People may fall in the same problem in the future.

klesniewski commented 5 years ago

The fix is in production for nearly a week now. Since then, the issue did not appear. We can see in logs, that in the last 3 days, the added proxy spotted and resolved inconsistencies roughly once daily.

2018-10-16 03:35:26,939 DEBUG: Building shard graph snapshot; total shard count: 8
2018-10-16 03:35:26,939  INFO: Inconsistency resolution retry attempt: 0. Backing off for 934 millis.
2018-10-16 03:35:27,873  WARN: Inconsistent shard graph state detected. Fetched: 8 shards. Closed leaves: 1 shards
2018-10-16 03:35:27,873 DEBUG: Following leaf node shards are closed: shardId-********************-c685d878
2018-10-16 03:35:27,883 DEBUG: Attempting to resolve inconsistencies in the graph with the following shards:
 shardId-********************-611e6f95
2018-10-16 03:35:27,883 DEBUG: Resolving inconsistencies in shard graph; total shard count: 9
2018-10-16 03:35:27,883  INFO: An intermediate page in DescribeStream response resolved inconsistencies. Total retry attempts taken to resolve inconsistencies: 1
2018-10-16 03:35:27,883 DEBUG: Num shards: 9

I think we can consider the problem resolved. Thank you guys for fixing it! I will leave the issue open as a reminder to update the documentation.

parijatsinha commented 5 years ago

I have requested for the documentation/walkthrough to be updated.

parijatsinha commented 5 years ago

Documentation has been updated. Closing this issue.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.CompleteProgram.html