Closed WesCossick closed 4 years ago
It's not just Node. And yes, in a lot of cases, you'd like to put some init system. Many people have suggested tini
, but I do recall some other contenders which I don't remember exactly. But I think tini
looks great with good reputation (although I've never used it). For example, take a look at this article.
After a tremendous amount of research and testing, I've finally gotten to the bottom of what's going on. In case anyone else encounters a similar issue, I've documented what I've learned…
It turns out that Node.js running as PID 1 handles signals like SIGTERM
and SIGINT
just fine. In fact, Docker doesn't really recommend using Tini unless you need to:
To clear up confusion in the blogosphere, you don’t always need a “init” tool to sit between Docker and Node.js, and you should probably spend more time thinking about how your app stops gracefully. ... For those that know about init options like docker run --init or using tini in your Dockerfile, they are good backup options when you can’t change your app code, but it’s a much better solution to write code to handle proper signal handling for graceful shutdowns.
The process.on('SIGTERM', ...);
handler I was working with wasn't being called, and since I knew it had been called successfully in the past, I had mistakenly attributed the root cause to our switch from Alpine to distroless. Then I assumed that the cause was downgrading from Node.js v12 to v10… also not the case.
After some further testing, I discovered that as soon as any asynchronous code began executing after the SIGTERM
signal was sent, the Node.js app would die. Since the process.on('SIGTERM', ...);
handler I was working with was inside an async
function, it was never being called. But, if I placed the same process.on('SIGTERM', ...);
handler outside of the async function, it would execute all the code up to the first asynchronous call, and then die prematurely.
I finally tracked the actual problem down to Prisma, which prevents apps from gracefully shutting down. This is discussed here: https://github.com/prisma/prisma/issues/2917. I was able to verify that Prisma is calling process.exit(0)
inside its own signal handler, which is causing apps to exit prematurely if they need to run asynchronous cleanup code in a SIGTERM
or SIGINT
handler.
Thanks for the update! It surely will help others.
It seems like an issue with the python distroless image at least. Any ideas?
docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
All signals to the python process are ignored in this case.
All signals to the python process are ignored in this case.
It works in my testing.
$ docker pull gcr.io/distroless/python3
Using default tag: latest
latest: Pulling from distroless/python3
Digest: sha256:975006719a62860e116b88adeac9dc278d939ddbec5e62f74b2e19f28d8fd3a5
Status: Image is up to date for gcr.io/distroless/python3:latest
gcr.io/distroless/python3:latest
$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
... container (the Python process) pauses ...
In another terminal,
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
45fed20c6b84 gcr.io/distroless/python3 "/usr/bin/python3.5 …" 3 seconds ago Up 2 seconds quirky_jepsen
$ docker exec -it 45 python
Python 3.5.3 (default, Nov 18 2020, 21:09:16)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import signal
>>> print(open('/proc/1/cmdline', 'r').read())
/usr/bin/python3.5-cimport signal; signal.pause()
>>> os.kill(1, signal.SIGINT)
>>> $
Then in the original terminal,
$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
Traceback (most recent call last):
File "<string>", line 1, in <module>
KeyboardInterrupt
$
OTOH, in https://github.com/GoogleContainerTools/distroless/issues/550#issue-658578656, the NodeJs doc clearly states
a Node.js process running as PID 1 will not respond to SIGINT (CTRL-C) and similar signals.
It doesn't seem to work outside of the container, is what I meant.
Neither docker kill -s SIGTERM <containerid>
or kill -SIGTERM <pid>
works. In comparison, something like gcr.io/google-containers/pause
does handle external signals.
All signals to the python process are ignored in this case.
Still, this is not true.
$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
In another terminal, I do
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
751e3f71ad5a gcr.io/distroless/python3 "/usr/bin/python3.5 …" 4 seconds ago Up 3 seconds goofy_panini
$ docker kill -s SIGINT 75
75
And in the original terminal,
$ docker run --rm gcr.io/distroless/python3 -c 'import signal; signal.pause()'
Traceback (most recent call last):
File "<string>", line 1, in <module>
KeyboardInterrupt
$
SIGINT works here as well, but not SIGTERM. Is that expected? If I run it outside of distroless, it does quit upon receiving SIGTERM.
At least SIGKILL (yes, I did confirm) and SIGINT work. I am not a Python dev, so I don't know about what's with SIGTERM. But this is not a Distroless issue. The behavior is consistent outside Distroless.
Using the official Docker Hub python
image,
$ docker pull python
Using default tag: latest
latest: Pulling from library/python
Digest: sha256:e2cd43d291bbd21bed01bcceb5c0a8d8c50a9cef319a7b5c5ff6f85232e82021
Status: Image is up to date for python:latest
docker.io/library/python:latest
$ docker run --name test --rm --entrypoint python python -c 'import signal; signal.pause()'
... container pauses ...
In another terminal,
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c6a21270f3d python "python -c 'import s…" 41 seconds ago Up 40 seconds test
$ docker kill -s SIGTERM test
test
$ docker kill -s SIGTERM test
test
$ docker kill -s SIGTERM test
test
$ docker kill -s SIGINT test
test
$ docker kill -s SIGINT test
Error response from daemon: Cannot kill container: test: No such container: test
$
SIGKILL will work because the OS hard kills the process. SIGINT works I believe because docker itself installs an INT handler, so you can do ^-c
. You're right that the official python process doesn't seem to work either. So it seems probably related to being pid 1. How do you expect it to work gracefully in kubernetes though, which relies on SIGTERM for terminating pods? I guess that goes back to the idea of using tini
or something. Thanks for your help!
SIGINT works I believe because docker itself installs an INT handler, so you can do
^-c
.
I don't know what kind of INT handler you are talking about that is installed in which process at which level, but it's not that a process can modify another running process to magically install some signal handler to do something. In fact, no OS allows one process to modify another. A process can only send signals to others.
What is clear is that, the python
process registered an INT handler with its own stack-printing code, it is designed to receive and react to SIGINT, I was able to send SIGINT to python
(how I can send signals is not important, BTW), and the python
process did react to the signal in the tests above by printing out its internal stack trace to the console. It's the python
process that does this printing in its own way, no one else.
Traceback (most recent call last):
File "<string>", line 1, in <module>
If you are saying that the Docker runtime doesn't just allow sending SIGTERM to a process in a container running on its Docker runtime, then that's a Docker runtime problem. (However, I don't think that is the case.)
But then, I found out the correct reason: https://hackernoon.com/my-process-became-pid-1-and-now-signals-behave-strangely-b05c52cc551c
Well PID 1 is special in Linux, amongst other things it ignores any signals unless a handler for that signal is explicitly declared
Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on
SIGINT
orSIGTERM
unless it is coded to do so.
I meant when you do docker run <container>
in your shell, docker itself handles SIGINT so you can do ^-c
at the command line. I assume it passes SIGINT onwards to the process. You can see that docker CLI continues to run after doing docker run, it isn't replaced by the container process as far as I can tell.
But it's interesting that python itself seems to handle SIGINT as pid 1 but ignores SIGTERM completely. I still can't explain why that's the case - I guess this is something baked into python.
Doing this, I can get it to respond to SIGTERM:
docker run --rm gcr.io/distroless/python3 -c 'import signal; import sys; signal.signal(signal.SIGTERM, lambda a,b : sys.exit(0)); signal.pause()'
Just felt I wanted add an update to prevent any further dissemination of misinformation or any possible confusion for posterity who are not familiar with signal handling.
SIGINT works I believe because docker itself installs an INT handler, so you can do
^-c
I should clearly point out that this is not true.
Even if the docker
CLI didn't implement a SIGINT handler and completely ignored typing ^-c
, the python
process running inside a container on the Docker runtime can and will accept SIGINT and run its stack-pretty-printing handler code, whether it runs as PID 1 or not. The python
is explicitly coded to do so. (Moreover, even if docker
didn't have a SIGINT handler, it already provides other means to send signals to the running process (docker kill
). And even if docker kill
didn't exist, you can still send any signals to python
, as demonstrated in https://github.com/GoogleContainerTools/distroless/issues/550#issuecomment-790755388).
python itself seems to handle SIGINT as pid 1 but ignores SIGTERM completely. I still can't explain why that's the case
See the linked article in https://github.com/GoogleContainerTools/distroless/issues/550#issuecomment-790871783.
Thanks for the info, but I wasn't trying to disseminate misinformation. Just trying to understand where the issue lies - whether with distroless or elsewhere. The linked article itself I found a bit confusing. It states that "it ignores any signal with the default action.". This makes it sound like the process is doing something special to ignore these signals. But all that is happening is that the default actions for those signals are not installed by the kernel for PID 1 as a special case.
In summary:
^-c
. If it didn't have this, the receiving process would not get the SIGINT at the command line (unless you did -it
to attach the TTY to the process, probably). You can observe this pretty easily by sending SIGINT to the docker pid, and the python process then also gets the SIGINT. But this is mostly irrelevant to the discussion, which was about SIGTERM.I believe both Java and golang seem to handle SIGTERM just fine as PID 1, so it's Python just not setting up its own handler for it, and relying on the OS default action to be installed. In my mind, this means it would make some sense for distroless Python to do something special to handle SIGTERM when running python as pid 1. Otherwise people will run into issues with it not stopping properly such as in K8s, unless they take special action.
In my mind, this means it would make some sense for distroless Python to do something special to handle SIGTERM when running python as pid 1.
IMO, this is out of the scope of Distroless. PID 1 means a lot on Linux and is supposed to take a lot of responsibilities including adopting orphan processes, reaping zombie processes, handling signals, etc. And the way you want to deal with these issues highly depends on your situation. People suggest different ideas and approaches for resolving these issues, and if you google, you'll find a lot of articles about what they think is their problem and what would be the best solution in their situation. There are many different ways and tools. For some, directly running your application process as PID 1 just works, and they think it's perfectly fine; no need for the usual PID 1 responsibilities. If your long-running process can spawn child processes and has a possibility of zombie processes starving all PIDs, perhaps deploying a full init system may be what you want, which can also be a solution to your python SIGTERM issue as well. But some folks don't like this setup but argues that running a single process in a container is the best practice which lets the container runtime manage the lifecycle of an application; with multiple processes, there's added complexity to properly handle app lifecycle. And in your situation, you only seem to care about whether SIGTERM terminates python
or not, but for others, it's not just a matter of a particular signal working in a certain way. For example, you said "Java seems to handle SIGTERM just fine as PID 1", but in a containerized environment, some folks still think the way Java handles SIGTERM is problematic and often people write a wrapper script around it: #464
Currently, the Node.js distroless container runs the Node.js process as PID 1:
According to Node.js's best practices:
So basically, Node.js apps won't receive
SIGTERM
,SIGINT
, etc. when running insidegcr.io/distroless/nodejs
.