Graylog2 / graylog-docker

Official Graylog Docker image
https://hub.docker.com/r/graylog/graylog/
Apache License 2.0
361 stars 133 forks source link

forwarder: add health-check #137

Open jalogisch opened 3 years ago

jalogisch commented 3 years ago

we should add a health check to the forwarder image to give users the ability to have this forwarder restarted automatically on failure.

This need to be added to the image like:

https://github.com/Graylog2/graylog-docker/blob/4.0/docker/oss/Dockerfile#L132-L137

and should check if the process in the image is running proper.

malcyon commented 3 years ago

I see that the health check for the graylog image is using the REST API for the Docker health check:

https://github.com/Graylog2/graylog-docker/blob/4.0/health_check.sh#L96

But I don't think the forwarder is listening on any network ports. At least, I don't see any in the container:

root@0dd04a31debb:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 18:22 pts/0    00:00:00 tini -- /forwarder-entrypoint.sh /bin/bash
root           7       1  6 18:22 pts/0    00:00:11 /usr/local/openjdk-8/bin/java -Xms1g -Xmx1g -XX:-OmitStackTraceInFastThrow -jar /usr/share/gr
root         129       0  0 18:23 pts/1    00:00:00 /bin/bash
root         375     129  0 18:23 pts/1    00:00:00 bash
root         743     375  0 18:25 pts/1    00:00:00 ps -ef
root@0dd04a31debb:/# netstat -anlp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 172.17.0.2:46284        34.203.170.187:13302    ESTABLISHED 7/java              
tcp        0      0 172.17.0.2:52336        199.232.10.132:80       TIME_WAIT   -                   
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
unix  2      [ ]         STREAM     CONNECTED     1883982  7/java          

I think Docker can restart on failure the way it's configured now, but it's definition of "failure" is the process exiting.

@danotorrey What's a good way to check the health of the forwarder?