31z4 / zookeeper-docker

Docker image packaging for Apache Zookeeper
MIT License
285 stars 243 forks source link

Healthcheck #79

Closed ckosmowski closed 4 years ago

ckosmowski commented 5 years ago

Expected behavior

If the zookeeper inside the container is not reachable anymore or any error that keeps the zookeeper from working happens, the container should have a failing healthcheck so it can be restarted by the docker engine.

Actual behavior

No matter what happens to the zookeeper inside, the container keeps running pretending it is healthy. Which leeds to broken connections and an ureachable zookeeper and it is even not noticed by operations.

Steps to reproduce the behavior

Start an instance of the zookeeper image. Break the zookeeper inside (We're currently investigating what causes the crashes and connection aborts, so we don't know exactly what breaks the connections).

But we get lots of this stuff:

Refusing session request for client /172.31.0.1:46016 as it has seen zxid 0x4000000000 our last zxid is 0x3e00000cc9 client must try another server,

System configuration

Docker swarm, three instances (however, due to the missing healthcheck insinde the image, we think this is irrelevant to the issue).

31z4 commented 4 years ago

Hey @ckosmowski, sorry to say that, but it looks like there is no chance for healthcheck to be merged into the official docker library due to maintainer's position. Please see https://github.com/docker-library/postgres#282 and https://github.com/docker-library/cassandra/pull/76#issuecomment-426816911 for details.

Here are some ideas of implementing healthceck in a custom image: https://github.com/31z4/zookeeper-docker/pull/28.