Mouse-Imaging-Centre / pydpiper

Python code for flexible pipeline control
Other
25 stars 10 forks source link

better cleanup of /dev/shm #344

Open bcdarwin opened 7 years ago

bcdarwin commented 7 years ago

The issue here is that an executor crash leaves files from rotational_minctracc.py in /dev/shm (rotational minctracc can't clean up after itself in certain cases, such as a hard kill signal from the scheduler). Ostensibly, the node's ramdisk may eventually become full as a result.

The nicest thing would be to verify that in all possible crash situations (ctl-c, walltime limit, ...) the pipeline correctly cleans up (via exception handling, signal handling, etc.).

A simple workaround would be for each executor to create a subdirectory of /dev/shm and register a file lock via the flock syscall. One could then (e.g., at executor start/exit) look for appropriately titled directories lacking a lock (signifying the executor responsible has exited and the OS kernel has removed the lock) and delete them. (Obviously there's a potential race condition here ...)

Similar remarks probably apply to /tmp.

gdevenyi commented 7 years ago

Properly configured queuing systems meta-manage the $TMPDIR variable so that the job runner cleans up the temporary files after completion so this shouldn't be needed for /tmp if you honour the $TMPDIR environment variable.