danielqsj / kafka_exporter

Kafka exporter for Prometheus
Apache License 2.0
2.09k stars 602 forks source link

An error has occurred during metrics gathering #112

Open kirashet opened 4 years ago

kirashet commented 4 years ago

I'm using the official docker image of danielqsj/kafka-exporter.

Set it up, works like a charm.

Then we tested some resilience stuff (setup is 1xZookeeper, 2xKafka Brokers). We switched off the brokers by one another and started back on.

The kafka world was fine, just the kafka-exporter has some sniff.

If I call the metrics endpoint, I get

`An error has occurred during metrics gathering:

7 error(s) occurred:

docker logs shows:

time="2019-07-11T08:51:49Z" level=error msg="Cannot get offset of group axTest: kafka: broker not connected" source="kafka_exporter.go:396" time="2019-07-11T08:51:49Z" level=error msg="Cannot get consumer group: kafka: broker not connected" source="kafka_exporter.go:370" time="2019-07-11T08:51:51Z" level=info msg="Refreshing client metadata" source="kafka_exporter.go:233"

Restarting the exporter didn't do the trick either.

Any ideas?

kirashet commented 4 years ago

Nobody?

kirashet commented 4 years ago

Ok guys, I deleted the consumer group via

kafka-consumer-groups --bootstrap-server localhost:9092 --delete --group testconsumer

and the kafka_exporter doesn't bitch anymore.

Unfortunately I don't know the root cause.

Does someone?

simonasr commented 4 years ago

Having the same issue 😢

Allen-yan commented 3 years ago

Having the same issue , But I cannot delete the group

kirashet commented 3 years ago

Having the same issue , But I cannot delete the group

Why not?

moogzy commented 2 years ago

We've started to observe this on our redpanda cluster.

What we've found is that when broker leadership changes or when pods/nodes restart the exporter can fall into this state. A reboot while interruptive....often resolves it which is less than ideal.

Did anyone make progress on this?