Open bcdarwin opened 7 years ago
Properly configured queuing systems meta-manage the $TMPDIR variable so that the job runner cleans up the temporary files after completion so this shouldn't be needed for /tmp if you honour the $TMPDIR environment variable.
The issue here is that an executor crash leaves files from rotational_minctracc.py in /dev/shm (rotational minctracc can't clean up after itself in certain cases, such as a hard kill signal from the scheduler). Ostensibly, the node's ramdisk may eventually become full as a result.
The nicest thing would be to verify that in all possible crash situations (ctl-c, walltime limit, ...) the pipeline correctly cleans up (via exception handling, signal handling, etc.).
A simple workaround would be for each executor to create a subdirectory of /dev/shm and register a file lock via the
flock
syscall. One could then (e.g., at executor start/exit) look for appropriately titled directories lacking a lock (signifying the executor responsible has exited and the OS kernel has removed the lock) and delete them. (Obviously there's a potential race condition here ...)Similar remarks probably apply to /tmp.