DOMjudge / domjudge

DOMjudge programming contest jury system
https://www.domjudge.org
GNU General Public License v2.0
701 stars 249 forks source link

Judgehost crash did not recover well after restart #2476

Open eldering opened 2 months ago

eldering commented 2 months ago

Description of the problem

On the WF luxor online judge, submission 3989, judging 11755 for wf46 crashed/hung with the error below. The judgedaemon was restarted, but the judging was still pending and had to be manually rejudged.

[Apr 18 14:11:21.031] judgedaemon[686632]: API request POST judgehosts/fetch-work
[Apr 18 14:11:29.921] judgedaemon[686632]: ⇝ Received 5 'judging_run' judge tasks (endpoint default)
[Apr 18 14:11:29.921] judgedaemon[686632]:   Working directory: /opt/domjudge/output/judgings/judgehost0003-2/endpoint-default/3989/11755
[Apr 18 14:11:29.921] judgedaemon[686632]:   πŸ”“ Executing chroot script: 'chroot-startstop.sh stop'
[Apr 18 14:11:29.945] judgedaemon[686632]:   πŸ”’ Executing chroot script: 'chroot-startstop.sh start'
[Apr 18 14:11:29.989] judgedaemon[686632]: API request GET config
[Apr 18 14:11:30.030] judgedaemon[686632]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 1.1937795567297sec... (1/3)
[Apr 18 14:11:31.265] judgedaemon[686632]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 2.117632600813sec... (2/3)
[Apr 18 14:11:33.423] judgedaemon[686632]: error: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading Retry limit reached.
[Apr 18 14:11:36.724] judgedaemon[688607]: Judge started on judgehost0003-2 [DOMjudge/8.3.0DEV/0121a2f98]

and

[Apr 18 14:11:01.708] judgedaemon[777720]: API request POST judgehosts/add-judging-run/judgehost0006-2/824666
[Apr 18 14:11:01.735] judgedaemon[777720]:   ESC[1;31mβœ—ESC[0m  ...done in 0.015s (CPU: 0.001s), result: run-error
[Apr 18 14:11:01.735] judgedaemon[777720]: API request POST judgehosts
[Apr 18 14:11:01.777] judgedaemon[777720]:   πŸ”™ Returned unfinished judging with jobid 11753 in my name; given back unfinished runs from me.
[Apr 18 14:11:01.777] judgedaemon[777720]: API request POST judgehosts/fetch-work
[Apr 18 14:11:01.795] judgedaemon[777720]:   πŸ”“ Executing chroot script: 'chroot-startstop.sh stop'
[Apr 18 14:11:01.816] judgedaemon[777720]: No submissions in queue (for endpoint default), waiting...
[Apr 18 14:11:09.676] judgedaemon[777720]: ⇝ Received 5 'judging_run' judge tasks (endpoint default)
[Apr 18 14:11:09.676] judgedaemon[777720]:   Working directory: /opt/domjudge/output/judgings/judgehost0006-2/endpoint-default/3988/11754
[Apr 18 14:11:09.678] judgedaemon[777720]:   πŸ”’ Executing chroot script: 'chroot-startstop.sh start'
[Apr 18 14:11:09.722] judgedaemon[777720]: API request GET config
[Apr 18 14:11:09.763] judgedaemon[777720]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 1.0024431142967sec... (1/3)
[Apr 18 14:11:10.807] judgedaemon[777720]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 2.0028794896802sec... (2/3)
[Apr 18 14:11:12.851] judgedaemon[777720]: error: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading Retry limit reached.
[Apr 18 14:11:16.152] judgedaemon[778589]: Judge started on judgehost0006-2 [DOMjudge/8.3.0DEV/0121a2f98]
[Apr 18 14:11:16.153] judgedaemon[778589]: Installing signal handlers
[Uploading debug-s3989-judgehost0003.zip…]()

Your environment

DOMjudge at https://onlinejudge.icpc.global/ on the wfluxor-online branch.

eldering commented 2 months ago

debug-s3989-judgehost0003.zip debug-s3989-judgehost0006.zip

vmcj commented 2 months ago

debug-s3989-judgehost0003.zip debug-s3989-judgehost0006.zip

I suspect we need to keep the access/error log also for this one as I suspect we ran out of PHP fpm workers at that point.