Graylog2 / graylog-docker

Official Graylog Docker image
https://hub.docker.com/r/graylog/graylog/
Apache License 2.0
357 stars 132 forks source link

Swarm - Docker image healthcheck never passes #271

Closed Nathan-Nesbitt closed 5 months ago

Nathan-Nesbitt commented 5 months ago

Seems that on first run in the swarm, the container does not serve up a positive healthcheck when you hit this point:

It seems you are starting Graylog for the first time. To set up a fresh install, a setup interface has
been started. You must log in to it to perform the initial configuration and continue.
Initial configuration is accessible at 0.0.0.0:9000

When I go to visit the URL I run into 2 issues:

  1. The proxy never routes the traffic as it detects the container is in a bad state
  2. Docker restarts the container as it is viewed as unhealthy

Since it appears that the only way to set up the server is by using the generated auth to log in for the first time and set up the server, I imagine the health check should pass at this point so it avoids the previous 2 issues? Seems docker checks the health status a couple of times before killing it, and it fails each time:

image

This is the logs at the end:

image

Maybe it's a config issue? I'm not sure how the healthcheck is done, so I cannot go much further.

Expected Behavior

Graylog docker container should respond with a positive healthcheck when the server is waiting for the admin to set up the fresh installation.

Current Behavior

Server doesn't respond with a status, which when it hangs for an extended period of time in a swarm is automatically restarted under the assumption that the container is stalled/dead.

Possible Solution

Container should respond with healthy once we expose a port waiting for the client to set it up. As far as I can tell the client is working as expected, there are no errors in the log and I can run it outside of a swarm.

Steps to Reproduce (for bugs)

  1. Start a swarm
  2. Start a proxy pointing to the service that does health checks
  3. Run the service in swarm

Context

Recently upgraded to a newer version of graylog, was using it for server logging on multiple applications within docker swarm. Cannot get the application going now so no logging :(

Your Environment

graylog --> portainer --> traefik

janheise commented 5 months ago

Hi @Nathan-Nesbitt, I have no idea about swarm - but would like to ask about some more info: This is an existing setup that you upgrade, right? And you want to connect to an existing Elasticsearch 7.10.2?

In this case, you should specifiy the connection string to elasticsearch in your docker-compose files (https://go2docs.graylog.org/5-2/setting_up_graylog/graylog_data_node_getting_started.htm?tocpath=Setting%20up%20Graylog%7CGraylog%20Data%20Node%7C_____1)

sth. like GRAYLOG_ELASTICSEARCH_HOSTS: "http://opensearch1:9200,http://opensearch2:9201,http://opensearch3:9202"

This should at least skip the "waiting for the initial setup" step - which in your case is not necessary.

Nathan-Nesbitt commented 5 months ago

@janheise great workaround thank you it's back up 💯

Looks like this is still a problem tho as I won't be able to redeploy on any new machines. Let me know how I can help diagnose / narrow this down and I'll do what I can to help!

janheise commented 5 months ago

@Nathan-Nesbitt - I'm happy that I was able to help. As I wrote, I have no experience with swarm. With redeploy, do you mean "set up a whole new cluster from scratch" or "adding new machines to an existing cluster". As long as the MongoDB stays intact (unless, as I said, you want to start from 0) adding new machines should work out fine.

A completely new setup should work fine, too, if you use plain OpenSearch/Elasticsearch. For the DataNode, I'd have to make some tests with swarm - but with some manual pre-configuration, it should work, too.

Nathan-Nesbitt commented 5 months ago

@janheise Totally, I am more concerned if I need to set this up on a whole new cluster!

What would be involved with the manual configuration? :)

janheise commented 5 months ago

@Nathan-Nesbitt Let me reiterate: if you use plain OpenSearch for new setups nothing changes except the elasticsearch_hosts setting that is now mandatory.

First steps we undertook with the DataNode is simplifying the SSL configuration for your setups by adding a UI etc. You can also generate your own certificates and add them to the config manually and by doing so, skip the initial configuration, too. We'll probably support swarm installations etc. better in upcoming releases. 5.2 is our first release with the DataNode. I'll put "test with and support swarm" in our to-do list.

janheise commented 5 months ago

problem seems to be fixed