SumoLogic / sumologic-collector-docker

A Sumo Logic collector for Docker.
Apache License 2.0
69 stars 55 forks source link

Container does not respond to sigterm #61

Open samgurtman-zz opened 6 years ago

samgurtman-zz commented 6 years ago

On a SIGTERM 15, the docker container does not seem to exit. Instead docker times out and sends a SIGKILL.

Might be due to -t option?

https://github.com/SumoLogic/sumologic-collector-docker/blob/d4ea5500ecbcdcbf619c4296ae143375f33e6d20/run.sh#L119

bin3377 commented 6 years ago

-t should be not related here. It means the collector registration on backend is ephemeral (e.g. will be deleted after goes offline for a certain period of time.)

What is the scenario you want the SIGTERM works for? In most of cases, we have an assumption that "container running == collector running". So you should pause/stop/remove the container if you want to stop the collector.

samgurtman-zz commented 6 years ago

It causes shutdown to take much longer and it hampers error detection. Docker best practice is to respond to SIGTERM sent to the root process. The SIGKILL that's sent after timeout is a workaround for processes that don't shutdown gracefully.

bin3377 commented 6 years ago

Did some investigation on it. Looks like the collector handles the SIGTERM but somehow when running docker stop the signal is not passed into collector process. The evidence is the signal can be received when using kill <pid> in another attached console inside the container:

$ docker exec -it 04a47d8088259b0f2fe98ccb525615298513d1f075991dc7361d5c96742415de /bin/bash
root@04a47d808825:/# tail -f /opt/SumoCollector/logs/collector.out.log
INFO   | jvm 1    | 2018/01/24 00:18:45 | `+.|=|`+. |  | |  | |  |  | |  | | |  | |  |
INFO   | jvm 1    | 2018/01/24 00:18:45 | .    |  | |  | |  | |  |  | |  | | |  | |  |
INFO   | jvm 1    | 2018/01/24 00:18:45 | |`+. |  | |  | |  | |  |  | |  | | |  | |  |
INFO   | jvm 1    | 2018/01/24 00:18:45 | `+.|=|.+' `+.|=|.+' `+.|  |.|  |+' `+.|=|.+'
INFO   | jvm 1    | 2018/01/24 00:18:45 | Sumo Logic Collector Version 19.209-25
INFO   | jvm 1    | 2018/01/24 00:18:45 | Sumo Logic Build Hash a76f595
INFO   | jvm 1    | 2018/01/24 00:18:45 | current folder:/opt/SumoCollector
INFO   | jvm 1    | 2018/01/24 00:18:45 |   * See /opt/SumoCollector/./logs for more details.
INFO   | jvm 1    | 2018/01/24 00:18:45 |   * Connecting to https://nite-events.sumologic.net.
INFO   | jvm 1    | 2018/01/24 00:18:48 |   * Retrieved configuration from service.
STATUS | wrapper  | 2018/01/24 00:23:00 | TERM trapped.  Shutting down.

The last line indicates there is a TERM signal handled properly but it's not there if using docker stop. I was just kicked out because of the container stopped:

$ docker exec -it 04a47d8088259b0f2fe98ccb525615298513d1f075991dc7361d5c96742415de /bin/bash
root@04a47d808825:/# tail -f /opt/SumoCollector/logs/collector.out.log
INFO   | jvm 1    | 2018/01/24 00:17:57 | `+.|=|`+. |  | |  | |  |  | |  | | |  | |  |
INFO   | jvm 1    | 2018/01/24 00:17:57 | .    |  | |  | |  | |  |  | |  | | |  | |  |
INFO   | jvm 1    | 2018/01/24 00:17:57 | |`+. |  | |  | |  | |  |  | |  | | |  | |  |
INFO   | jvm 1    | 2018/01/24 00:17:57 | `+.|=|.+' `+.|=|.+' `+.|  |.|  |+' `+.|=|.+'
INFO   | jvm 1    | 2018/01/24 00:17:57 | Sumo Logic Collector Version 19.209-25
INFO   | jvm 1    | 2018/01/24 00:17:57 | Sumo Logic Build Hash a76f595
INFO   | jvm 1    | 2018/01/24 00:17:57 | current folder:/opt/SumoCollector
INFO   | jvm 1    | 2018/01/24 00:17:57 |   * See /opt/SumoCollector/./logs for more details.
INFO   | jvm 1    | 2018/01/24 00:17:57 |   * Connecting to https://nite-events.sumologic.net.
INFO   | jvm 1    | 2018/01/24 00:17:59 |   * Retrieved configuration from service.
samgurtman-zz commented 6 years ago

Yes, this is usually because the start script has forked the process or run it in a subshell instead of using exec.

On Wed, Jan 24, 2018 at 1:45 PM, Bin Yi notifications@github.com wrote:

Did some investigation on it. Looks like the collector handles the SIGTERM but somehow when running docker stop the signal is not passed into collector process. The evidence is the signal can be received when using kill

in another attached console inside the container: $ docker exec -it 04a47d8088259b0f2fe98ccb525615298513d1f075991dc7361d5c96742415de /bin/bash root@04a47d808825:/# tail -f /opt/SumoCollector/logs/collector.out.log INFO | jvm 1 | 2018/01/24 00:18:45 | `+.|=|`+. | | | | | | | | | | | | | | INFO | jvm 1 | 2018/01/24 00:18:45 | . | | | | | | | | | | | | | | | | INFO | jvm 1 | 2018/01/24 00:18:45 | |`+. | | | | | | | | | | | | | | | | INFO | jvm 1 | 2018/01/24 00:18:45 | `+.|=|.+' `+.|=|.+' `+.| |.| |+' `+.|=|.+' INFO | jvm 1 | 2018/01/24 00:18:45 | Sumo Logic Collector Version 19.209-25 INFO | jvm 1 | 2018/01/24 00:18:45 | Sumo Logic Build Hash a76f595 INFO | jvm 1 | 2018/01/24 00:18:45 | current folder:/opt/SumoCollector INFO | jvm 1 | 2018/01/24 00:18:45 | * See /opt/SumoCollector/./logs for more details. INFO | jvm 1 | 2018/01/24 00:18:45 | * Connecting to https://nite-events.sumologic.net. INFO | jvm 1 | 2018/01/24 00:18:48 | * Retrieved configuration from service. STATUS | wrapper | 2018/01/24 00:23:00 | TERM trapped. Shutting down. The last line indicates there is a TERM signal handled properly but it's not there if using docker stop — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or mute the thread .
colinbjohnson commented 3 years ago

Here is a snapshot of the processes running within the container as well as me attempting to run a kill -9 1 (kill process ID 1):

colinjohnson@cjohnson07 sumologic_docker_file % docker exec -it a9791bf5cdcc  /bin/bash
root@a9791bf5cdcc:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4904  1920 ?        Ss   03:42   0:00 /bin/sh /opt/SumoCollector/collector console
root        59  0.0  0.0  21800  3640 ?        Sl   03:42   0:00 /opt/SumoCollector/./wrapper /opt/SumoCollector/./config/wrapper.conf wrapper.syslog.ident=collector wrapper.pidfile=/opt/SumoCollector/./collector.pid wrapper.name=collector wrapper.displayname=SumoLogic 
root        61  2.3  2.4 4520024 298912 ?      Sl   03:42   0:16 /opt/SumoCollector/jre/bin/java -XX:+UseParallelGC -server -Djava.security.egd=file:/dev/./urandom -Xms64m -Xmx128m -Djava.library.path=./19.319-4/bin/native/lib -classpath ./19.319-4/lib/HikariCP-java7-2.
root       114  1.0  0.0  18516  3388 pts/0    Ss   03:54   0:00 /bin/bash
root       124  0.0  0.0  34412  2792 pts/0    R+   03:54   0:00 ps aux
root@a9791bf5cdcc:/# kill 1
root@a9791bf5cdcc:/# kill -9 1
root@a9791bf5cdcc:/# 

For the above container, if I do a kill -9 59 the container exits.

root@a9791bf5cdcc:/# kill -9 59
root@a9791bf5cdcc:/# %                                                                                                                                                                                                                                                        colinjohnson@cjohnson07 sumologic_docker_file %