Memory retained by kinesis_producer_process after processing all the events

eankit commented 1 year ago

We have found an issue where the daemon process kinesis_producer retains the memory post-processing all the events for a long time, this increases the overall memory of our system, how can we drain the memory from the daemon, is there any config settings that can help us release this memory, this goes in GBs under high load and hence take up application memory.

SHession commented 1 year ago

Just adding to this issue, we are seeing memory usage problems after upgrading from 0.14.12 to 0.15.5. This caused our instances to encounter out of memory exceptions over the weekend. We are downgrading back to 0.14.12 until this is resolved.

jhead commented 9 months ago

We seem to be encountering this as well. During load tests, we can see overall container memory gradually increasing over time and, after the tests complete, the memory usage remains high even hours later.

We're running KPL v0.15.8 currently.

The following Massif output is from a KPL process put under load (~500 records/sec) for a couple hours. After the load was stopped, the KPL was allowed to drain to outstandingRecordsCount=0 while the process' memory remained at ~1.8GB (incl. Valgrind overhead). The process was then left idle overnight and some small load added (~2 records/sec), before finally being shut down.

Massif output: https://gist.github.com/jhead/33d8f7c5d847b628859510d38d4f5659

It's worth noting that we found this occurs primarily when being throttled by Kinesis (not enough shards). At first we thought the KPL hadn't drained its internal queue of records, but even after waiting for outstandingRecordsCount=0, the memory remained unchanged so it seems like a memory leak.

Below is a dashboard during another load test with ~2,000 records/sec and significant throttling (1 shard on the stream). During this test, each KPL was found using in excess of 7GB of memory.

awslabs / amazon-kinesis-producer

Memory retained by kinesis_producer_process after processing all the events #485