Graylog2 / graylog-docker

Official Graylog Docker image
https://hub.docker.com/r/graylog/graylog/
Apache License 2.0
372 stars 134 forks source link

healthcheck doesnt work in docker swarm #101

Closed Chilinot closed 4 years ago

Chilinot commented 4 years ago

The healthcheck fails to resolve the hostname of the service when used in the http_publish_uri config. More specifically, im hitting this issue, and using its workaround: https://github.com/moby/moby/issues/35451#issuecomment-405879306

It would be nice if there was a way to configure the healthcheck only without having to change the http_publish_uri value.

jalogisch commented 4 years ago

could you think of any way to make this possible? I do not have any docker swarm environment to test and develop.

Chilinot commented 4 years ago

I dont have any good ideas. Right now i have defined the hostname directly on the service (as in solution 3 in the linked issue above). Which shouldnt be an issue, but time will tell.

Chilinot commented 4 years ago

I guess you could add a flag that will test 127.0.0.1 instead, which would only check that localhost can reach the api. But i cant really think of a scenario right now that would make the container to only accept localhost and not accept remote connections. Maybe if you configured it in a bad way.

tdac42 commented 4 years ago

We had the same issue with our swarm - our workaround was to override the healthcheck in the docker compose file -

healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/api"]
      interval: 10s
      timeout: 10s
      retries: 10
      start_period: 30s

and removing the healthcheck from the Dockerfile:

- # add healthcheck
- #HEALTHCHECK \
- #  --interval=10s \
- #  --timeout=10s \
- #  --retries=15 \
- #  CMD /health_check.sh

Not sure if this is 'the best way' but it allowed us to use our Graylog cluster again ;-)

jalogisch commented 4 years ago

@Chilinot @tdac-highwire

I'm thinking of adding a second check to the health_check that will actually first try the http_publish_uri because that is the official way and as fallback if that is not reachable it checks localhost ... that might work.

What do you think?