SebastianKG commented 2 years ago

The Problem

I've been trying to get kafka_exporter working with the drop-in-Kafka-replacement Redpanda (https://vectorized.io/) in a Kubernetes environment. kafka_exporter connects to the Redpanda Service (which is in front of several Redpanda Pods) just fine; however, when I visit kafka_exporter:9308/metrics in my browser, the page never resolves (it continues to try to load for a very long time before the browser times it out). Neither Redpanda norkafka_exporter log any explicit errors during this time -- even with --verbosity=5 --log.enable-sarama specified for kafka_exporter, the only logs from kafka_exporter are:

[sarama] <some timestamp> client/metadata fetching metadata for all topics from broker <service name:service port>
<some other timestamp> 1 kafka_exporter.go324] concurrent calls detected, waiting for first to finish

...repeated over and over again. Prometheus similarly has no success scraping the metrics endpoint, I can see in its UI that it lists this source as detected, but with status DOWN.

Theories

Redpanda does not use Zookeeper, which is one of its big selling points (lower complexity). There are some references to Zookeeper in kafka_exporter's source code, along with the following two configuration flags: use.consumelag.zookeeper, zookeeper.server. Is it sensible to think that this is the source of the problem? Is there any way around this? I would like to use your tool, since (other than this problem) it would allow me to treat Redpanda and Kafka interchangeably.

BenPope commented 2 years ago

With sufficient logging, Redpanda reports:

TRACE 2021-12-02 22:43:23,951 [shard 0] kafka - requests.cc:89 - [127.0.0.1:51444] processing name:metadata, key:3, version:5 for kafka_exporter
TRACE 2021-12-02 22:43:23,951 [shard 0] kafka - request_context.h:152 - [127.0.0.1:51444] sending 3:metadata response {throttle_time_ms=0 brokers={{node_id=1 host=0.0.0.0 port=19092 rack={nullopt}}} cluster_id={nullopt} controller_id=1 topics={{error_code={ error_code: none [0] } name={kafka_exporter_test} is_internal=false partitions={{error_code={ error_code: none [0] } partition_index=0 leader_id=1 leader_epoch=-1 replica_nodes={{1}} isr_nodes={{1}} offline_replicas={}}, {error_code={ error_code: none [0] } partition_index=1 leader_id=1 leader_epoch=-1 replica_nodes={{1}} isr_nodes={{1}} offline_replicas={}}, {error_code={ error_code: none [0] } partition_index=2 leader_id=1 leader_epoch=-1 replica_nodes={{1}} isr_nodes={{1}} offline_replicas={}}} topic_authorized_operations=0}} cluster_authorized_operations=-2147483648}

askeydeteque commented 2 years ago

Hi, I just want to put in that I am using this exporter with Redpanda without any issues and working as expected.

ETetzlaff commented 2 years ago

Just chiming in here; I too am running kafka-exporter alongside redpanda in kubernetes okay. I did come across this issue you had as well though while deploying kafka-exporter. I think this is a different bug than the current topic though.

[sarama] <some timestamp> client/metadata fetching metadata for all topics from broker <service name:service port>
<some other timestamp> 1 kafka_exporter.go324] concurrent calls detected, waiting for first to finish

This error is obscure, and I found it only occurs when there is a topic(s) present on the redpanda cluster, but zero consumer groups. If you add a consumer group and subscribe it to a topic, I then needed to restart kafka-exporter and things started to work okay after that. I'm not sure if this is the same case with a traditional kafka cluster, however.

ETetzlaff commented 2 years ago

Started looking into this and was able to recreate this on kafka_exporter v1.4.2 but not master. It seems to me like this commit has fixed the issue I described above.

danielqsj / kafka_exporter

Use with Redpanda #286

The Problem

Theories