danielqsj / kafka_exporter

Kafka exporter for Prometheus
Apache License 2.0
2.1k stars 602 forks source link

Excessive (10GB+) memory usage for Kafka exporter #385

Closed joachimbulow closed 1 year ago

joachimbulow commented 1 year ago

We have a few topics in our Kafka cluster, around 10 brokers, and only a few partitions. Consumers (groups) in the 1000s.

We have deployed Kafka using Strimzi Kafka Operator Helm chart (0.34.0).

This comes equipped with a kafka-exporter instance.

When trying to scrape the exposed endpoint the following events a logged in the pod description:

image

The pods will try to consume absolutely bogus amounts of memory causing them all to get evicted

image

The following is our Kafka CRD (again, using Strimzi operator):

image

We haven't really set any custom configuration, but i do not see how these amounts of energy consumption should even be possible.

Any hints as to where the potential memory leak could be / where we might have set it up wrong?

Fyi, we did add gracious amounts of resources to the Kafka CRD.

EDIT:

I have identified an extremely large number of consumer group offsets being stored in the cluster and therefore processed by the exporter, as we have spun consumers up and down with unique group ids many times creating millions over time.

I deleted all inactive consumer groups, as they congested the exporter, by executing the following script inside a broker:

/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list | while read GROUP; do \
  STATUS=$(/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group "$GROUP" | awk '{print $6}'); \
  if [ "$STATUS" = "Empty" ]; then \
    echo "Deleting consumer group: $GROUP"; \
    /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --delete --group "$GROUP"; \
  else \
    echo "Skipping active consumer group: $GROUP"; \
  fi; \
done

This issue will be closed, as it was fact just an absurd amount of data. We could of course also limit the exporter to not scrape the offsets at all.