Closed greg closed 3 years ago
It should use runguard indeed, so I'm not sure. @meisterT do you have any clue?
/cc @eldering
I don't know why this happens, I just tried on my local machine (without using docker) and runguard did limit the number of processes as expected. What value do you have configured in "Process limit"?
The default process limit is 64, I didn't realise that was an available setting. Should I set it to 1 or 0?
No, don't lower it that much: that limit includes the shell script, and potentially other programs that wrap the actual solution (e.g. the JVM with multiple threads) so lowering it to anything below 5-10 puts you at risk of random crashes because of running out of threads.
Runguard should enforce that limit, and should kill any processes after the judging run is done, so the error found processes still running
is definitely a sign that something is not functioning as expected.
I have no experience with Docker, so it's difficult for me to say exactly what is wrong. Although seeing that those processes are all "defunct" (i.e. zombie processes, see https://en.wikipedia.org/wiki/Zombie_process), maybe they don't get properly orphaned by init (pid=1) inside the container?
@greg can you still reproduce this issue? I would like to fix this issue for you and possible other users which have this issue.
And if you can reproduce it, do you get the same behaviour when you run the judgecontainer interactive?
We've been running into similar issues with forked processes and have a potential solution. For reference we are submitting the following python script which forks < 64 times. This produces the same output as described above. It's interesting to note that this issue doesn't exist if you execute bash in the container and then manually run start.sh
.
import os
for i in range(10):
if os.fork() <= 0:
break
Here's what I think is happening:
/scripts/start.sh
as PID 1start.sh
execs judgehost, hence judgehost assumes PID 1runguard
executes the program along with its forks creating childrenrunguard
exits, so the zombie children are assigned to PID 1judgehost/php
doesn't reap children as it isn't designed to have random children assigned to ittestcase_run.sh
discovers existing processes and exitsThe solution for this is to use a init system like dumb-init (see PR #83) or figure out why the zombie processes are being spawned.
The PR https://github.com/DOMjudge/domjudge-packaging/pull/83 indeed solved the issue.
I start the judgehost with
and run (in another shell)
as in #11. The executables are all as provided, no modifications. I then submit the C solution (to hello world in the demo contest):
and this happens:
the container exits and the judgehost is down. shouldn't
runguard
protect against stuff like this?