Yolean / kubernetes-kafka

Kafka cluster as Kubernetes StatefulSet, plain manifests and config
Apache License 2.0
1.84k stars 734 forks source link

Recommended Grafana dashboard (Prometheus)? #262

Open guyromb opened 5 years ago

guyromb commented 5 years ago

At the moment I am unable to find a Grafana dashboard that works with the metrics exporter. It seems like it missing common query functions.

I tried to use this template too to get additional metrics: https://github.com/rama-nallamilli/kafka-prometheus-monitoring/blob/master/prometheus-jmx-exporter/confd/templates/kafka.yml.tmpl With this Grafana template: https://github.com/rama-nallamilli/kafka-prometheus-monitoring/edit/master/dashboards/Kafka.json

but no results

solsson commented 5 years ago

I'm on the lookout too, passively :)

I'm also eager to see how #259 affects our Kafka monoitoring, but we haven't rolled it out. The readme for https://github.com/google-cloud-tools/kafka-minion mentions a Grafana dashboard. @weeco any status on that?

weeco commented 5 years ago

Sure, I want to release a new version of Kafka Minion which contains some more metrics about topic configuration. Once that's done I'll offer a dashboard. You can expect stuff like this:

Kafka Minion

I am not sure if this is something @guyromb is looking for. He was probably looking for a dashboard to monitor Kafka itself, rather than the consumer group lags. We could share our dashboard on this soon as well.

guyromb commented 5 years ago

That's awesome @weeco What are the recommended adjustments to make this work using this repo configurations? I would like to have something (even minimal monitoring) working

weeco commented 5 years ago

You need to run https://github.com/google-cloud-tools/kafka-minion which is an alternative to LinkedIn's burrow. Solsson opened a PR for it on this repo: https://github.com/Yolean/kubernetes-kafka/pull/259 . We've got a helm chart for it, but it's not done yet.

solsson commented 5 years ago

@guyromb I recommend you start with kafka-minion then, and report any issues in #259. Consumer lag monitoring says a lot about kafka health. Kafka-minion means one layer (and scrape interval) less than with Burrow because it's developed for use with Prometheus / Openmetrics.

At Yolean we use the generic dashboards from Prometheus Operator quite a lot too. You'll have graphs for memory and CPU for example.

guyromb commented 5 years ago

Thanks for clarifying this @solsson I will give it a try.

weeco commented 5 years ago

@guyromb @solsson I added the suggested Grafana dashboard for Kafka Minion. See: https://github.com/cloudworkz/kafka-minion or https://grafana.com/dashboards/10083

solsson commented 5 years ago

@guyromb I've bumped the image in #259 to the new release