andreas-schroeder / kafka-health-check

Health Check for Kafka Brokers.
MIT License
243 stars 84 forks source link
cluster-health health-check kafka

Kafka Health Check

Health checker for Kafka brokers and clusters that operates by checking whether:

Status

Build Status

Release version is 0.1.0

Compiled binaries are available for Linux, macOS, and FreeBSD.

Use Cases

Submit a pull request to have your use case listed here!

Self-healing cluster

At AutoScout24, in order to reduce operational workload, we use kafka-health-check to automatically restart broker nodes as they become unhealthy.

In-place rolling updates

At AutoScout24, to keep the OS up to date of our clusters running on AWS, we perform regular in-place rolling updates. As we run immutable servers, we terminate each broker and replace them with fresh EC2 instances (keeping the previous broker ids). In order not to jeopardy the cluster stability when terminating brokers, we verify that the cluster is healthy before taking one broker offline. Similarly, we wait for the broker coming back online to fully catch up before proceeding with the next broker. To achieve this, we use the cluster health information provided by kafka-health-check.

Usage

Usage of kafka-health-check:
  -broker-host string
        ip address or hostname of broker host (default "localhost")
  -broker-id uint
        id of the Kafka broker to health check
  -broker-port uint
        Kafka broker port (default 9092)
  -check-interval duration
        how frequently to perform health checks (default 10s)
  -no-topic-creation
        disable automatic topic creation and deletion
  -replication-failures-count uint
        number of replication failures before broker is reported unhealthy (default 5)
  -replication-topic string
        name of the topic to use for replication checks - use one per cluster, defaults to broker-replication-check
  -server-port uint
        port to open for http health status queries (default 8000)
  -topic string
        name of the topic to use - use one per broker, defaults to broker-<id>-health-check
  -zookeeper string
        ZooKeeper connect string (e.g. node1:2181,node2:2181,.../chroot)

Broker Health

Broker health can be queried at /:

$ curl -s <broker-host>:8000/
{
    "broker": 1,
    "status": "sync"
}

Return codes and status values are:

The returned json contains details about replicas the broker is lagging behind:

$ curl -s <broker-host>:8000/
{
    "broker": 3,
    "status": "imok",
    "out-of-sync": [
        {
            "topic": "mytopic",
            "partition": 0
        }
    ],
    "replication-failures": 1
}

Cluster Health

Cluster health can be queried at /cluster:

$ curl -s <broker-host>:8000/cluster
{
    "status": "green"
}

Return codes and status values are:

The returned json contains details about metadata status and partition replication:

$ curl -s <broker-host>:8000/cluster
{
    "status": "yellow",
    "topics": [
        {
            "topic": "mytopic",
            "status": "yellow",
            "partitions": {
                "1": {
                    "status": "yellow",
                    "OSR": [
                        3
                    ]
                },
                "2": {
                    "status": "yellow",
                    "OSR": [
                        3
                    ]
                }
            }
        }
    ]
}

The fields for additional info and structures are:

Supported Kafka Versions

Tested with the following Kafka versions:

Kafka 0.8 is not supported.

see the compatibility spec for the full list of executed compatibility checks. To execute the compatibility checks, run make compatibility. Running the checks requires Docker.

Building

Run make to build after running make deps to restore the dependencies using govendor.

Prerequisites

Notable Details on Health Check Behavior