COSC1127-AI / pacman-contest-cluster

Script to run the Conquer The Flag PACMAN contest
http://ai.berkeley.edu/contest.html
Apache License 2.0
19 stars 4 forks source link

Handle filesystem errors in hosts #123

Open ssardina opened 1 year ago

ssardina commented 1 year ago

In Oc 2022 we had an issue with logs being empty and some runs not finishing in pacman-02.

It turn out it was lack of space, and the system couldn't create the tmp folders for the games. Logs were being cut or left empty and games were stuck.

The issue is in this code:

image

Wonder if we can do better, in terms of telling the cluster script that some host is out of space, and stop the whole proceudure so we know...

Maybe the host should check for free space and tell the script when it returns info, the script can then stop and report lack of space in the host