Closed wresch closed 3 years ago
Hi @wresch , looks like a race condition between container exit and cleanupd. Cleanupd remove directory before container completely exit. strace slowing down traced process, that could explain why cleanupd has normal behaviour when traced.
Is this issue is annoying and requires a fix for 2.4.3 release ? Between that will be fixed in next major release.
Ah - i should have thought of that. Well this caused a bit of trouble on our our cluster - a user was running a toil pipeline with a bunch of docker containers and the leftover runtime directories of that one particular container ended up trashing /tmp on a bunch of nodes and it was made worse by a (now fixed) bug in our /tmp cleanup code. The final impact for us kind of depends on how many docker containers behave that way. The impact for the user is that he has to either find/build a different container or clean up manually. If it was me i'd say a race condition like that should probably be fixed in the next micro release but I don't know how long your to-do-list is.
@wresch Could you test above PR to see if that fix issue ?
The 50ms delay seems to do the trick in this case:
Stock version - replicating faulty behavior
$ singularity -vvv run 'docker://quay.io/biocontainers/vcflib:1.0.0_rc1--0'
....
LOG : USER=wresch, IMAGE='vcflib:1.0.0_rc1--0', COMMAND='run'
VERBOSE: Starting runscript
Singularity> exit
VERBOSE: Cleaning directory: /tmp/.singularity-runtime.JtGQCkTC
WARNING: Failed removing file: /tmp/.singularity-runtime.JtGQCkTC/quay.io/biocontainers/vcflib:1.0.0_rc1--0/dev
ERROR : Could not remove directory /tmp/.singularity-runtime.JtGQCkTC: Device or resource busy
ABORT : Retval = 255
Patched version (patch #1265 ):
$ bin/singularity -vvv run 'docker://quay.io/biocontainers/vcflib:1.0.0_rc1--0'
...
LOG : USER=wresch, IMAGE='vcflib:1.0.0_rc1--0', COMMAND='run'
VERBOSE: Starting runscript
Singularity> exit
VERBOSE: Cleaning directory: /tmp/.singularity-runtime.zcKMsRl2
Tested multiple times with runtime dir on local disk (/tmp), on NFS, and on GPFS (via setting SINGULARITY_LOCALCACHEDIR).
I am hit by this issue really hard. Most of our containers are giant (3-5G) and (probably because of the size??) singularity is failing to clean up the /tmp/.singularity- directories - which then causes our /tmp on our cluster nodes to run out of disk.
@soichih can you test that the release-2.4 branch fixes this issue for you please?
@GodloveD
I am still seeing this issue as of 2.4.2-dist
hayashis@karst(h2):~ $ singularity --version
2.4.2-dist
Does 2.4.2-dist contains the fix you mention?
Hi @soichih. No, 2.4.2-dist doesn't have it, but I believe we put it into the branch that is slated to become 2.4.3. Right now that is in release-2.4
. Are you able to test that branch?
I don't have sudo access to our HPC cluster, so I've tried to recreate the problem on my Ubuntu dev VM using both 2.4.1-dist from neurodebian, and release-2.4 from this github repo (./configure && make install)
I've ran my containers many times on both versions but I couldn't recreate the problem on this VM.. When singularity exits, it successfully removes the .singularity-runtime.****
directory.
I will try repeating my test on another slurm cluster that I do have sudo access to.
I did notice, however, if I stop singularity while it's in "Creating container runtime..." stage, the .singularity-runtime directory will remain in /tmp directory. It's possible that the .singularity-runtime would be left in /tmp and pile up if 1) HPC kills the job (or preempted by other jobs, etc..) while it's in creating the runtime or 2) /tmp becomes full while creating container and singularity crashes.
Does cleanup not happen if singularity is killed while it's creating container runtime?
This issue is still happening after I upgraded singularity to 2.5.2-dist. I think it's probably related to the use of docker container singularity exec docker://somecontainer ...
, but I am not sure.
Is there a way to capture the runtime directory path? If so, I can try adding "rm -rf /tmp/.singularity-runtime.$id" in our batch schedule epilogue to force it to be cleaned up.. Does anyone have any suggestion?
OK I did a bit more digging. The issue seems to be caused by cleanupd getting killed by batch schedulers after job timeout. Here is the sequence of events..
1) Job starts up singularity with some large container (>1G?) 2) PBS cluster detects walltime violation of the job. It sends SIGTERM to the singularity process. 3) singularity process dies - releasing cleanup trigger flock. cleanupd gets flock and proceed with cleanup. (I see "Cleaning directory: ..." message). 4) Soon after 3), PBS cluster also sends SIGTERM to cleanupd and cleanupd dies before it finishes cleaning. 5) /tmp is left with .singularity-runtime. (and .singularity-cleanuptrigger.)
Both PBS and slurm send SIGTERM followed by SIGKILL if a process won't die (by default 30 seconds after for slurm.. not sure on PBS). I am thinking that, if cleanupd is updated to handle away SIGTERM instead of just terminating, then it could give s_rmdir
function more time to do its job before terminated by SIGKILL. (you should advise cluster admins to have long enough delay also..?)
Another related issue with cleanupd is that, the cleanupd starts up after runtime directory is finished being created. If a job is killed while exploding docker layers, /tmp is left with .singularity-runtime. and .singularity-layers. I believe cleanupd should be started before runtime directory is created (I am seeing this on 2.5.2-dist) - and made it to cleanup .singularity-layers as well as .singularity-runtime.
Hello,
This is a templated response that is being sent out to all open issues. We are working hard on 'rebuilding' the Singularity community, and a major task on the agenda is finding out what issues are still outstanding.
Please consider the following:
Thanks, Carter
Didn't realize this was still open. As far as i'm concerned - i don't see any issues with cleanup in 3.7.3 with this container and we haven't encountered any more issues with singularity trashing /tmp in a long time. Other comments are 3 years old as well - i'll close.
Version of Singularity:
2.4.1
Expected behavior
/tmp/.singularity-runtime.* should be cleaned up when exiting container. This only occurs for a specific docker container so I'm not entirely sure that this is a singularity bug.
Actual behavior
not actually cleaned up
Steps to reproduce behavior
The really funny thing is that attaching strace to
cleanupd
after starting the container makes this behavior revert to expected:in other shell:
then exit the shell in the container
so now the runtime dir does get cleaned up as it should and the strace says
odd, right?