Closed wdesouza closed 5 months ago
Grepping around the toil source code it does look like there are still a few files in /tmp created by toil itself. But there shouldn't be many of them, and they should be pretty small.
How big are these files? They might be created by your CWL workflow rather than Toil itself.
You could try setting the environment variable TMPDIR=/home/welliton/tmp
before running: that should force everything to go to the directory you want.
I am processing multiple FASTQ files (total size 54 GB) using Docker containers. CWL file: https://github.com/labbcb/tool-rqc/blob/master/Rqc.cwl
First Toil created the directory /tmp/tmpwzXyvO/
with 56 GB.
Then, Toil created the directory /home/welliton/tmp/toil-f0b247e8-b6b8-4fda-995e-09ff3f10988f-a8c12647ed20e97601dcb817551088b8
with 54 GB.
I guess Toil is copping input files twice before executing workflow step. After completed, Toil deleted both temporary directories.
I tested with environment variable but Toil still creates temporary directory in /tmp
.
@Welliton309 If Toil is using hardlinks then it might not be making whole new copies
@mr-c I have used the commands du -hcs <dir>
and df -h
. I noticed this behavior because Toil failed with error message "no free disk space". I had to clean up /tmp
and run workflow again. I my case the /home
directory is in different partition and there is more free disk space than /tmp
.
@Welliton309 Okay -- I suspected you had checked but I wanted to be sure.
We encountered this problem as well. To repro, use this .cwl 'tool' that prints the current directory:
#!/usr/bin/env cwl-runner
#
# This sample workflow simply prints the current directory
#
cwlVersion: v1.0
class: CommandLineTool
baseCommand: pwd
inputs: []
stdout: stdout.txt
outputs:
- id: stdout
type: File
outputBinding:
glob: stdout.txt
Run
toil-cwl-runner pwd.cwl
Result is a file stdout.txt
containing the path to the default temp dir. We should be able to override it with:
toil-cwl-runner --workDir /some/other/path pwd.cwl
however the path in stdout.txt
is the same.
On one platform it works to define env variables TMP, TEMP, TMPDIR to the desired path, but this workaround doesn't work universally.
➤ Adam Novak commented:
We don’t think that this is still likely to be a problem, but we’ll check since we’re revising workdir selection for another issue.
We encountered this problem as well. To repro, use this .cwl 'tool' that prints the current directory:
#!/usr/bin/env cwl-runner # # This sample workflow simply prints the current directory # cwlVersion: v1.0 class: CommandLineTool baseCommand: pwd inputs: [] stdout: stdout.txt outputs: - id: stdout type: File outputBinding: glob: stdout.txt
Run
toil-cwl-runner pwd.cwl
Result is a file
stdout.txt
containing the path to the default temp dir. We should be able to override it with:toil-cwl-runner --workDir /some/other/path pwd.cwl
however the path in
stdout.txt
is the same.On one platform it works to define env variables TMP, TEMP, TMPDIR to the desired path, but this workaround doesn't work universally.
Seems like this is no longer an issue:
Running toil-cwl-runner --workDir test_subdir pwd.cwl
:
My stdout.txt
outputs:
/home/heaucques/Documents/toil-examples/test_subdir/toilwf-a4200ceea6145dc0abb756e6f4516be9/c578/job/tmpxd5xlrrv/tmp-outc3gxkh0a
➤ Adam Novak commented:
stxue says this is no longer a problem.
I am testing
toil-cwl-runner
with--workDir
parameter to avoid temporary files in/tmp
directory but it is not working. Toil still writes temporary files at/tmp
directory. Is it possible to instruct Toil to write all temporary and cached files in user specified directory?Command line:
Toil version: 3.11.0
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-206