danielqsj / kafka_exporter

Kafka exporter for Prometheus
Apache License 2.0
2.1k stars 602 forks source link

Exporter segfaults with --no-offset.show-all option #359

Open yl-anaumann opened 1 year ago

yl-anaumann commented 1 year ago

Hi there!

I know, we're misusing Kafka a bit, but we're having quite a number of consumer groups with a uuid in them that are no longer connected(because they received a message that caused a restart/reload and that created a new consumer with a new uuid) and that causes our prometheus instances to grow wildly(depending on the number of configuration changes).

Now disabling the offset.show-all option sounds exactly like the fix for our immediate problem, but whenever we enable it, the exporter segfaults regularly with this message(I can increase the log level if it helps, but the last time I did that, it wasn't that much more verbose):

# /opt/kafka_exporter/kafka_exporter --kafka.server=localhost:9092 --web.listen-address=:9308 --no-offset.show-all

Jan 03 16:29:22 kafka_exporter[1678040]: I0103 16:29:21.553852 1678040 kafka_exporter.go:792] Starting kafka_exporter (version=1.6.0, branch=HEAD, revision=c021e94dfb808e642d41064c6550cbba87fe30c6)
Jan 03 16:29:22 kafka_exporter[1678040]: I0103 16:29:21.582035 1678040 kafka_exporter.go:963] Listening on HTTP :9308
Jan 03 16:33:01 kafka_exporter[1678040]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 03 16:33:01 kafka_exporter[1678040]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x9e0e05]
Jan 03 16:33:01 kafka_exporter[1678040]: goroutine 1107 [running]:
Jan 03 16:33:01 kafka_exporter[1678040]: main.(*Exporter).collect.func4(0xc00022ca80)
Jan 03 16:33:01 kafka_exporter[1678040]:         /app/kafka_exporter.go:582 +0xb05
Jan 03 16:33:01 kafka_exporter[1678040]: created by main.(*Exporter).collect
Jan 03 16:33:01 kafka_exporter[1678040]:         /app/kafka_exporter.go:658 +0x8f0

I haven't looked too deeply into it, but PR #332 looks like a quick win.. I'll try to build it myself tomorrow to see if it really fixes our problem while we're working on streamlining our consumer group names in our application, but I'm not a fan of maintaining many patches for our infrastructure, I prefer to just download something that's already in the right state(who doesn't? :) ).

So my main question is if there is an ETA for a next release that might include the fix.

Kind regards, Andre

yl-anaumann commented 1 year ago

Good news.. installing the patched exporter made it a lot more stable

yuyivic commented 1 year ago

I have the same problem as you.

Limpopoo93 commented 10 months ago

Hi there!. We also get this error when disabling option --no-offset.show-all. In the course of work we noticed that this error occurs if you specify this flag --no-offset.show-all. When using default value --offset.show-all this error disappears. Can you tell me if there are any solutions to this problem? we're using a version 1.7.0.

Kind regards, Alex