confluentinc / cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Apache License 2.0
1.14k stars 704 forks source link

How to avoid the data loss on zookeeper/kafka server restart? #825

Closed raghala99 closed 4 years ago

raghala99 commented 4 years ago

I'm using docker-compose to start Zookeeper and Kafka using https://github.com/confluentinc/cp-docker-images/blob/5.1.1-post/examples/kafka-single-node/docker-compose.yml

After restarting the zookeeper and Kafka, the data is getting lost. Whether the message is subscribed or not. Please provide me the way to retain the data after server restarts.

OneCricketeer commented 4 years ago

This will happen for any Docker container. Have you learned about volume mounts?

You will need to define one into Zookeeper dataDir and Kafka logs.dir

raghala99 commented 4 years ago

Thanks for the suggestion.

albertmatyi commented 4 years ago

One must mount the following to dirs, since they are declared as VOLUMEs in the Dockerfile.

/var/lib/kafka/data and /etc/kafka/secrets

Note that mounting the parent dir is not an option, you have to take care of these paths specificly.

Another issue that we've faced that if we provisioned /var/lib/kafka/data with a "disk' (ebs storage) that contained a lost+found directory than the whole container didn't start.

In this case we needed to adapt our deployment's startup command to be:



# kafka-deployment.yml
        [... k8s mumbo jumbo ...]
       # spec.template.spec.containers:
          - name: kafka
            image: confluentinc/cp-kafka:5.2.1
            command: ["sh"]
            args: ["-c", "rm /var/lib/kafka/data/lost+found -rf ; /etc/confluent/docker/run"]
        [... more k8s mumbo jumbo ...]

thus ensuring that before startup a possible `lost+found` dir is removed from the freshly mounted dir.
OneCricketeer commented 4 years ago

FWIW - just use Compose, not k8s... Context http://kelda.io