confluentinc / jmx-monitoring-stacks

📊 Monitoring examples for Confluent Cloud and Confluent Platform
Apache License 2.0
59 stars 170 forks source link
ansible ccloud cloudwatch confluent confluent-cloud confluent-platform datadog-agent elastic grafana influxdb jmx-exporter jolokia kafka kibana kubernetes metricbeat monitoring newrelic prometheus telegraf

Overview

This repo demonstrates examples of JMX monitoring stacks that can monitor Confluent Cloud and Confluent Platform. While Confluent Cloud UI and Confluent Control Center provides an opinionated view of Apache Kafka monitoring, JMX monitoring stacks serve a larger purpose to our users, allowing them to setup monitoring across multiple parts of their organization, many outside of Kafka, and to have a single pane of glass.

This project provides metrics and dashboards for:

📊 Dashboards

Some examples:

List of available dashboards for Confluent Platform:

Dashboard Prometheus/Grafana New Relic Metricbeat/Kibana Telegraf/Influx Datadog
Kafka Cluster yes yes yes yes yes
Zookeeper yes yes yes
KRaft yes
Confluent Schema Registry yes yes
Kafka Connect yes yes
ksqlDB yes yes
Producer/Consumer yes yes yes yes
Lag Exporter yes
Topics yes yes
Kafka Streams yes
Kafka Streams RocksDB yes
Quotas yes
TX Coordinator yes
Confluent Rest Proxy yes
Confluent Cluster Linking yes
Confluent Oracle CDC connector yes
Debezium connectors yes
MongoDB connector yes
librdkafka clients yes
Confluent RBAC yes
Confluent Replicator yes
Confluent Tiered Storage yes
Confluent Flink yes

List of available dashboards for Confluent Cloud:

Dashboard Prometheus/Grafana New Relic Metricbeat/Kibana AWS Cloud Watch
Cluster yes yes yes yes
Producer/Consumer yes
ksql yes
Billing/Cost tracking yes

⚠️ Alerts

Alerts are available for the stacks:

How to use with Confluent cp-ansible

To add JMX exporter configurations to Confluent cp-ansible, please refer to this README

How to use with Kubernetes and Confluent for Kubernetes Operator (CFK)

To add JMX exporter configurations to your Kubernetes workspace, please refer to this README

How to use with Confluent cp-demo

This repo is intended to work smoothly with Confluent cp-demo.

Make sure you have enough system resources on the local host to run this. Verify in the advanced Docker preferences settings that the memory available to Docker is at least 8 GB (default is 2 GB).

NOTE: jq is required to be installed on your machine to run the demo.

  1. Ensure that cp-demo is not already running on the local host.

  2. Decide which monitoring stack to demo: and set the MONITORING_STACK variable accordingly.

NOTE: New Relic requires a License Key to be added in jmxexporter-newrelic/start.sh

NOTE: Datadog requires a DATADOG_API_KEY and DATADOG_SITE to be added in datadog/start.sh. Datadog offers 14 day trial for new users.

# Set only one of these
MONITORING_STACK=jmxexporter-prometheus-grafana
MONITORING_STACK=metricbeat-elastic-kibana
MONITORING_STACK=jmxexporter-newrelic
MONITORING_STACK=jolokia
MONITORING_STACK=jolokia-telegraf-influxdb
MONITORING_STACK=datadog
  1. Clone cp-demo and checkout a branch.
# Example with CP-DEMO 7.7.1 version
CP_DEMO_VERSION=7.7.1-post

[[ -d "cp-demo" ]] || git clone https://github.com/confluentinc/cp-demo.git
(cd cp-demo && git fetch && git checkout $CP_DEMO_VERSION && git pull)
  1. Clone jmx-monitoring-stacks and checkout main branch.
[[ -d "jmx-monitoring-stacks" ]] || git clone https://github.com/confluentinc/jmx-monitoring-stacks.git
(cd jmx-monitoring-stacks && git fetch && git checkout main && git pull)
  1. Start the monitoring solution with the STACK selected. This command also starts cp-demo, you do not need to start cp-demo separately.
${MONITORING_STACK}/start.sh
  1. Stop the monitoring solution. This command also stops cp-demo, you do not need to stop cp-demo separately.
${MONITORING_STACK}/stop.sh

How to use with Apache Kafka client applications (producers, consumers, kafka streams applications)

For an example that showcases how to monitor Apache Kafka client applications, and steps through various failure scenarios to see how they are reflected in the provided metrics, see the Observability for Apache Kafka® Clients to Confluent Cloud tutorial.

How to use with specific configurations: DEV-toolkit

Open in Gitpod

Dev-toolkit is an environment that allows you to easily create different configurations and deployments to verify the metrics exposed by different components of the Confluent Platform.

Dev-toolkit is based on Prometheus and Grafana stack.

To run a lightweight a Default environment, follow the next steps:

  1. cd dev-toolkit
  2. [Optional]: Put your new dashboards into the grafana-wip folder. All existing grafana dashboards will be anyway loaded.
  3. start.sh
  4. For Grafana, go to http://localhost:3000, login with admin/password
  5. stop.sh

Run with profiles

Default profile will create:

To add more use cases, we are leveraging the docker profiles.

To run replicator scenario, i.e. run start.sh --profile replicator.

It's possible to combine profiles as well, i.e. start.sh --profile schema-registry --profile ksqldb.

Currently supported profiles:

DEV-toolkit FAQ

More docker-compose envs will be released in the future, in the meantime you can use Kafka Docker Composer to create yours.

You can add them to the start.sh, i.e.

# ADD client monitoring to prometheus config
cat <<EOF >> assets/prometheus/prometheus-config/prometheus.yml

  - job_name: 'spring-client'
    static_configs:
      - targets: ['spring-client:9191']
        labels:
          env: "dev"
EOF

You can also change the prometheus configuration here.