danielqsj / kafka_exporter

Kafka exporter for Prometheus
Apache License 2.0
2.17k stars 608 forks source link

Memory leaks in this exporter #193

Open akamensky opened 3 years ago

akamensky commented 3 years ago

Not sure under what conditions, but this exporter leaks memory like hell even causing OOM when not caught on time:

[Dec 8 09:56] kafka_exporter invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[  +0.000009] kafka_exporter cpuset=/ mems_allowed=0
vin01 commented 3 years ago

I encountered the same and looks like it happens when connection failures occur either when connection failures occur (timeouts due to firewall / unresponsive broker under heavy load)

I think it is an upstream issue, I raised an issue in Sarama to discuss this further: https://github.com/Shopify/sarama/issues/1857

atrbgithub commented 3 years ago

@vin01 this may be related https://github.com/danielqsj/kafka_exporter/issues/54

atrbgithub commented 3 years ago

We see memory gradually rising over the course of 6 days or so (9mb usage to 36mb for example), and then suddenly it will shoot up to say 150mb at which point it is restarted.

Over the course of the whole period, I only seen one instance of kafka: broker not connected so not sure if it is related to the Sarama issue, unless you don't see the connection problem written to the logs when it occurs.

akamensky commented 3 years ago

In out case the connection timeout happens between prometheus scraper and this exporter. Exporter itself can scrape metrics from Kafka, but getting the page over HTTP takes a very long time. Server is quite busy with load avg staying pretty high up. We don't see the same issue on servers that are not loaded.

So I don't think it is related to Sarama at all.

vin01 commented 3 years ago

Any relevant logs from the exporter @akamensky ? You can also try enabling log.enable-sarama to get some more logs from sarama in case there is something.