Closed gmloose closed 3 years ago
The expected behavior here would be for the workflow to fail with a different exception, noting that the failing job had failed, right?
We actually hooked up CWL execution to use Toil's FileStore
in f194cbfc206f12b8016009205428ba10db2d95b0. That's after the version you're using, and it more or less replaces all the file staging logic that was previously used. So we'll have to see if that ended up fixing this issue. But it still might exist with the --bypass-file-store
option, even in the current development version.
Presumably you are using Toil's default --retryCount
of 1, so the failed job will be retried. Maybe cwltool
is doing something to the filesystem in the first run that isn't cleaned up and trips it up when it goes to stage files for the second run.
I retried with the current master
branch and the issue is gone. In that case, I do have to pass the --bypass-file-store
option by the way, because the temporary output files must be stored on a shared file system. I didn't fiddle with --retryCount
, so I cannot tell if increasing that would also solve the issue.
I do have to pass the --bypass-file-store option by the way, because the temporary output files must be stored on a shared file system.
I think you can also tell toil-cwl-runner
to put intermediates on your shared filesystem even without bypassing the file store, with --jobStore
to set the directory for files that need to move between jobs, and --workDir
to set where the per-job scratch directories go.
Glad to hear you have this working now! I'm going to close the issue.
Symlink creation error when step fails
Problem description
In certain situations, when a step fails with a non-zero exit status, Toil tries to create symbolic links to a temporary directory and the files inside it. This fails with a
FileExistsError
exception. What seems to happen is the following:Or, conceptually:
ln -s /tmp/dir dir
ln -s /tmp/dir/file dir/file
It is not exactly clear under which circumstances Toil tries to create these symbolic links. It appears that the following pre-conditions have to be met:
Demonstration
The workflow consists of two steps. The first step will select the first entry from a list of directories, and pass that entry as input to the second step. The second step will simply generate an exit status 1. The extra input parameter
min_separation
is not used, but the error only occurs when it is specified. The workflow validates without warnings and runs without errors usingcwltool
. When usingtoil-cwl-runner
, I get aWorkflowException
due to an unhandledFileExistsError
exception. You can run the workflow as follows:The code can be downloaded from https://github.com/gmloose/toil_symlink_bug
Software versions being used
┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-997