Graylog2 / docker-compose

A set of Docker Compose files that allow you to quickly spin up a Graylog instance for testing or demo purposes.
Apache License 2.0
357 stars 134 forks source link

docker-compose ignoring depends_on and entrypoint on server restart #30

Open dreadedhamish opened 1 year ago

dreadedhamish commented 1 year ago

I've noticed that when restarting my server Docker will restart the graylog container but mongodb or open-search. I'm also aware these examples are for testing, and previously they were changed to be restart: "on failure" rather than always, but I think it's still not expected behaviour.

But this may not be a graylog or even docker-compose issue - please correct me where Im wrong:

Current behaviour: docker compose up: all containers start docker compose up graylog: all containers start restart server: graylog starts, other containers don't

Expected behaviour: restart server: all containers previously running start, and graylog waits for mongodb and opensearch to be ready.

There are 2 failsafe's in the docker-compose for making sure the requisite services are started before graylog starts:

  1. depends_on
  2. entrypoint wait-for-it.sh waits for opensearch to be available at opensearch:9200

re. 1 - apparently docker doesn't know about docker-compose variables, and so depends_on is ignored - it just restarts previously running containers. Not ideal, but regardless in this instance it only restarts graylog. re. 2 - shouldn't the wait-for-it.sh script fire even when restarting the container after a server restart?

Any ideas?

dreadedhamish commented 1 year ago

Troubleshooting further it seems my issue stems from an unclean shutdown of the graylog docker container - mongodb and opensearch shutdown correctly so don't trigger the restart on-failure condition, but graylog apparently shutdown cleanly.

My operating system (Manjaro) shuts down very quickly, and is probaby not allowing enough time. I've since added graceful shutdown to all 3 containers but that hasn't helped. I'm now crawling through logs trying to find clues as to why graylog is crashing, or sending the wrong signal, or whether Manjaro is just not waiting for an all-clear.

dreadedhamish commented 1 year ago

Here are the logs from shutdown for graylog (from the operating system logs) - nothing jumps out at me: 20:05:18 dockerd: time="2023-06-26T20:05:18.916975194+10:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60 daemonShuttingDown=true error="restart canceled" execDuration=19m30.178415309s exitStatus="{143 2023-06-26 10:05:17.589297607 +0000 UTC}" hasBeenManuallyStopped=false restartCount=0

20:05:18 dockerd: time="2023-06-26T20:05:18.916975194+10:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60 daemonShuttingDown=true error="restart canceled" execDuration=19m30.178415309s exitStatus="{143 2023-06-26 10:05:17.589297607 +0000 UTC}" hasBeenManuallyStopped=false restartCount=0

20:05:18 dockerd: time="2023-06-26T20:05:18.899268741+10:00" level=info msg="ignoring event" container=ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

20:05:18 containerd: time="2023-06-26T20:05:18.899190691+10:00" level=warning msg="cleaning up after shim disconnected" id=ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60 namespace=moby

20:05:18 containerd: time="2023-06-26T20:05:18.899190691+10:00" level=warning msg="cleaning up after shim disconnected" id=ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60 namespace=moby

20:05:18 containerd: time="2023-06-26T20:05:18.899070267+10:00" level=info msg="shim disconnected" id=ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60 namespace=moby

20:05:18 systemd: docker-ade8dd9f6c3cdc014db073546cd1e195e781748e6dc9b42a3fb7322419e0fa60.scope: Consumed 1min 53.928s CPU time.

dreadedhamish commented 1 year ago

here are the last few lines of the docker logs - does this look like a successful shutdown?

{"log":"\u0009at org.graylog2.database.PersistedServiceImpl.save(PersistedServiceImpl.java:198)\n","stream":"stderr","time":"2023-06-26T10:05:17.031969692Z"}

{"log":"\u0009at org.graylog2.system.activities.SystemMessageActivityWriter.write(SystemMessageActivityWriter.java:56)\n","stream":"stderr","time":"2023-06-26T10:05:17.031975488Z"}

{"log":"\u0009at org.graylog2.commands.Server$ShutdownHook.run(Server.java:326)\n","stream":"stderr","time":"2023-06-26T10:05:17.031980878Z"}

{"log":"\u0009at java.base/java.lang.Thread.run(Unknown Source)\n","stream":"stderr","time":"2023-06-26T10:05:17.031986271Z"}

dreadedhamish commented 1 year ago

I'm a little out of my depth here, but it looks like the java graceful shutdown issue that has popped up a few times before: https://github.com/Graylog2/graylog-docker/issues/173 https://github.com/Graylog2/graylog-docker/issues/87

dreadedhamish commented 1 year ago

Okay so I think there are 2 issues:

  1. it looks like the "jave not shutting down gracefully" bug is back again, so an issue for the docker container repository, not the docker-compose repository, and
  2. docker has some short-comings - the docker engine doesn't know about depends_on values in docker-compose, so in cases like the above although graylog will restart because it failed, it won't trigger the depends_on condition. Using "unles-stopped" on all containers brought them all up after a restart to my suprise as I thought it would have counted mongo and opensearch as having been stopped.