Open magJ opened 6 months ago
I ran into the same issue from time to time. Happened on earlier version of overmind, upgraded to latest 2.5.1 recently, still happening. I think the zombie process is the shell process, which in turns run the app process.
I spent a day trying to debug this issue without much success, I suspect that it's a actually a tmux bug, but I haven't been able to figure out a reliable way to reproduce it.
Hey there,
This definitely a bug of tmux not handling SIGCHLD
properly.
From the Overmind's point of view, the process is still running since Overmind can send signals to it. The only way to check if a process is in the zombie state is to read its state file or to use the ps
command. Both ways aren't pretty good to use them with short intervals. And I believe that it's not an imgproxy duty to kill zombies.
The walkaround proposed in https://github.com/tmux/tmux/issues/311 should theoretically work: prepend your commands with trap 'pkill -CHLD tmux' 0;
or trap 'pkill -CHLD tmux' EXIT;
.
To be honest, Overmind was never meant to run in production, it was developed mostly as a dev tool.
I'm using overmind to run three processes, one of the processes, "api" a nodejs process ran out of memory and crashed. However overmind still thinks that it's running.
It looks like the app process id "346" has become a zombie, but overmind has not detected it.
Overmind version:
2.4.0
Operating system: Debian bookworm, based off the docker imagenode:20.11.1-bookworm-slim
, and running on fly.ioThis issue happened on two different machines, but I'm really struggling to reproduce it. It might be a tmux issue, sounds similar to this https://github.com/tmux/tmux/issues/311 issue, but I really don't know.