We have a kinesis data stream with 10 shards (and enhanced-fanout auto enabled by KCL). We also have MultiLangDaemon from KCL2.x (Python) deployed in Docker. The requirement is to fetch data from all 10 shards and trigger some data processing steps once there is enough data. While testing the above setup, we found that there is 1 worker spawned per shard as a separate process and collecting records within its local memory. The questions are:
1) How do we get all the workers to write records to a shared list? (is that even possible? With Docker we define 1 entry point which spins up the MultiLangDaemon. So there's no other entry point for us to create a shared list which the workers can write into)
2) If shared list is not an option, is there a way to enforce the number of workers to just 1 (1 worker collecting records from 10 shards) and avoid the shared memory/list path?
Something we tried to enforce 1 worker:
Read somewhere that setting maxActiveThreads=3 will allow only 1 worker. When we tested with this property set, the behavior was inconsistent. Sometimes it was just 1 worker (which worked fine) and at other times there were >1 workers and one of them would go waiting indefinitely on a specific shard blocking other workers. Logs below:
2020-11-28 04:36:26,092 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Previous PROCESS task still pending for shard shardId-000000000105 since PT8M56.677S ago.
2020-11-28 04:36:26,092 [multi-lang-daemon-0000] INFO s.a.kinesis.coordinator.Scheduler [NONE] - Sleeping ...
2020-11-28 04:36:27,092 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Last time data arrived: 2020-11-28T04:27:29.415Z (PT8M57.677S)
2020-11-28 04:36:27,092 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Previous PROCESS task still pending for shard shardId-000000000105 since PT8M57.677S ago.
2020-11-28 04:36:28,093 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Last time data arrived: 2020-11-28T04:27:29.415Z (PT8M58.678S)
2020-11-28 04:36:28,093 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Previous PROCESS task still pending for shard shardId-000000000105 since PT8M58.678S ago.
2020-11-28 04:36:29,093 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Last time data arrived: 2020-11-28T04:27:29.415Z (PT8M59.678S)
2020-11-28 04:36:29,093 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Previous PROCESS task still pending for shard shardId-000000000105 since PT8M59.678S ago.
2020-11-28 04:36:30,093 [multi-lang-daemon-0000] WARN s.a.kinesis.lifecycle.ShardConsumer [NONE] - Last time data arrived: 2020-11-28T04:27:29.415Z (PT9M0.678S)
Hi,
We have a kinesis data stream with 10 shards (and enhanced-fanout auto enabled by KCL). We also have MultiLangDaemon from KCL2.x (Python) deployed in Docker. The requirement is to fetch data from all 10 shards and trigger some data processing steps once there is enough data. While testing the above setup, we found that there is 1 worker spawned per shard as a separate process and collecting records within its local memory. The questions are:
1) How do we get all the workers to write records to a shared list? (is that even possible? With Docker we define 1 entry point which spins up the MultiLangDaemon. So there's no other entry point for us to create a shared list which the workers can write into)
2) If shared list is not an option, is there a way to enforce the number of workers to just 1 (1 worker collecting records from 10 shards) and avoid the shared memory/list path?
Something we tried to enforce 1 worker: Read somewhere that setting maxActiveThreads=3 will allow only 1 worker. When we tested with this property set, the behavior was inconsistent. Sometimes it was just 1 worker (which worked fine) and at other times there were >1 workers and one of them would go waiting indefinitely on a specific shard blocking other workers. Logs below: