Open rynge opened 1 year ago
This is a bit tricky to test since the VO/Job can control the image and the bind mount options. So all the mounts may be there in the test (no need for overlay/underlay) and things may change in the job. I could always test for underlay/overlay but that would exclude the possibility to run w/ images w/ all the mount points and no overlay/underlay enabled. Should we add a flag to allow VO to require it?
And the problem w/ not exec-ing is that signal propagation would not work well causing run-away processes when jobs are killed
We are seeing job failures in the OSPool due to a site not having overlay/underlay configured correctly:
The main problem here is that GWMS is using a feature of Singularity which was not tested during Singularity detection. A simple test like
-B $PWD:/doesnotexist
would probably have been enough to avoid this.A secondary problem is that GMWS exec's Singularity, which means that
$_CONDOR_WRAPPER_ERROR_FILE
does not get updated. The job is thus marked as a user job failure instead of a wrapper failure (which would have restarted the job somewhere else). There might be a way to configure this behavior, but I have not found it yet.