DOMjudge / domjudge

DOMjudge programming contest jury system
https://www.domjudge.org
GNU General Public License v2.0
703 stars 249 forks source link

Possible Linux kernel lock contention when running multiple judgedaemons per machine #2277

Open taoky opened 7 months ago

taoky commented 7 months ago

Description of the problem

When rejuding a large contest or getting a lot of submission for problems with many testcases, it could be possible that some submissions are taking much longer wall time than their CPU time. With a short timelimit overshoot these submissions might be judged as TLE even if they are correct.

And this is actually what happens in a recent ICPC Asia Regional Contest (with ~350 teams and an easy problem with 50 testcases). After taking a lot time bisecting kernel and debugging, it was found out that a lock contention issue (2 global locks: shrinker_rwsem and cgroup_mutex) in kernel < 6.3 under heavy load might block kernel operations such as cgroup and page fault handling inside memory cgroup for several seconds.

(This is fixed (or alleviated) after kernel commit https://github.com/torvalds/linux/commit/da27f796a832122ee533c7685438dad1c4e338dd)

Though it is impossible for judgedaemon (runguard) to "fix" this issue by code, mentioning the kernel issue in documentation could be helpful for server admins.

Your environment

Steps to reproduce

Submit a correct solution many times at once like:

for i in $(seq 1 1000); ~/Downloads/domjudge-8.2.2/submit/submit --url http://localhost:12345/ --contest test -y G.cpp; end

And wait for it to be done.

Expected behaviour

Reasonable judgehost system load, and no submission takes a wall time much longer than its CPU time.

Actual behaviour

Judgehost system load >= 2 * judgedaemon number. With timelimit overshoot set to 1s|10%, some submissions are judged as TLE even they only take a very short CPU time. The judgement is very slow.

Any other information that you want to share?

https://github.com/DOMjudge/domjudge/pull/2157 mentions about "the call cgroup_delete_cgroup_ext did sometimes hang for multiple seconds". I'm afraid that a double check for this contest rejudgement might be necessary to ensure no correct solutions are judged as TLE...

If you are interested in this specific kernel issue, I have also written a blog post (Simp. Chinese) to help explain this to contestants affected in this regional contest, and for server admins in later contests.

nickygerritsen commented 7 months ago

Thanks a lot for this big write up. We normally advice to not run many judgehosts on one machine (since there will always be some overhead) but it might indeed be worth it to mention this explicitly.

summershrimp commented 7 months ago

Since you mentioned that disable CLONE_NEWIPC would fix this issue, how about using seccomp to restrict IPC related syscalls rather than create IPC namespace?

taoky commented 7 months ago

Since you mentioned that disable CLONE_NEWIPC would fix this issue, how about using seccomp to restrict IPC related syscalls rather than create IPC namespace?

Theoretically yes, but it would be a bit difficult to list all IPC-related syscalls, and the potential side effects of using seccomp are unknown.