DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
901 stars 240 forks source link

Toil fails to create output if file exists #4186

Open gmloose opened 2 years ago

gmloose commented 2 years ago

I noticed that Toil fails to create output if the output file already exists. It may be that this only happens in a particular case, so I will add the workflow that I used demonstrate this behavior below:

cwlVersion: v1.2
class: CommandLineTool
baseCommand: python3
arguments:
  - runtime.py

requirements:
  - class: InitialWorkDirRequirement
    listing:
      - entryname: 'runtime.py'
        entry: |
            print("runtime.outdir: $(runtime.outdir)")
            print("runtime.tmpdir: $(runtime.tmpdir)")

inputs: []

outputs:
  - id: logfile
    type: File[]
    outputBinding:
      glob: 'runtime.log'
stdout: runtime.log

If the output file runtime.log exists, it is not overwritten and Toil issues no warning whatsoever to inform the user. This is easily demonstrated by running this workflow twice:

$ for i in `seq 1 2`; do toil-cwl-runner runtime.cwl && ls -l --full-time runtime.log && cat runtime.log; done
[2022-08-24T16:45:28+0200] [MainThread] [I] [cwltool] Resolved 'runtime.cwl' to 'file:///home/marcel/temp/cwl/runtime.cwl'
[2022-08-24T16:45:29+0200] [MainThread] [I] [toil.job] Saving graph of 1 jobs, 1 new
[2022-08-24T16:45:29+0200] [MainThread] [I] [toil.job] Processing job 'CWLJob' runtime.cwl kind-CWLJob/instance-jl8uc004 v0
[2022-08-24T16:45:29+0200] [MainThread] [I] [toil] Running Toil version 5.7.1-b5cae9634820d76cb6c13b2a6312895122017d54 on host dop282.
[2022-08-24T16:45:29+0200] [MainThread] [I] [toil.leader] Issued job 'CWLJob' runtime.cwl kind-CWLJob/instance-jl8uc004 v1 with job batch system ID: 0 and cores: 1, disk: 3.0 Gi, and memory:
 2.0 Gi
[2022-08-24T16:45:29+0200] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/e87d25b825ff59bea6076e0a37466039/a40c/worker_log.txt
[2022-08-24T16:45:30+0200] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2022-08-24T16:45:31+0200] [MainThread] [I] [toil.leader] Finished toil run successfully.

Workflow Progress 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 (0 failures) [00:01<00:00, 1.66 jobs/s]
{
    "logfile": [
        {
            "location": "file:///home/marcel/temp/cwl/runtime.log",
            "basename": "runtime.log",
            "nameroot": "runtime",
            "nameext": ".log",
            "class": "File",
            "checksum": "sha1$17be0a158af6196ba1314637c334e5ab5992bb81",
            "size": 176
        }
    ]
}[2022-08-24T16:45:31+0200] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/tmp/tmpe1z63qse)
-rwxrw-r-- 1 marcel marcel 176 2022-08-24 16:45:31.661484745 +0200 runtime.log
runtime.outdir: /tmp/e87d25b825ff59bea6076e0a37466039/a40c/506d/tmppdwsq8ns/tmp-out8y77mv7i
runtime.tmpdir: /tmp/e87d25b825ff59bea6076e0a37466039/a40c/506d/tmpwu64ieuihece8e1w
[2022-08-24T16:45:32+0200] [MainThread] [I] [cwltool] Resolved 'runtime.cwl' to 'file:///home/marcel/temp/cwl/runtime.cwl'
[2022-08-24T16:45:33+0200] [MainThread] [I] [toil.job] Saving graph of 1 jobs, 1 new
[2022-08-24T16:45:33+0200] [MainThread] [I] [toil.job] Processing job 'CWLJob' runtime.cwl kind-CWLJob/instance-to5roaao v0
[2022-08-24T16:45:33+0200] [MainThread] [I] [toil] Running Toil version 5.7.1-b5cae9634820d76cb6c13b2a6312895122017d54 on host dop282.
[2022-08-24T16:45:33+0200] [MainThread] [I] [toil.leader] Issued job 'CWLJob' runtime.cwl kind-CWLJob/instance-to5roaao v1 with job batch system ID: 0 and cores: 1, disk: 3.0 Gi, and memory:
 2.0 Gi
[2022-08-24T16:45:33+0200] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/42b2e71884a8578aaa900e2bf0076916/4788/worker_log.txt
[2022-08-24T16:45:34+0200] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2022-08-24T16:45:35+0200] [MainThread] [I] [toil.leader] Finished toil run successfully.

Workflow Progress 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 (0 failures) [00:01<00:00, 1.69 jobs/s]
{
    "logfile": [
        {
            "location": "file:///home/marcel/temp/cwl/runtime.log",
            "basename": "runtime.log",
            "nameroot": "runtime",
            "nameext": ".log",
            "class": "File",
            "checksum": "sha1$c4b315966b12f8f97ea34b942678164ac8417b8f",
            "size": 176
        }
    ]
}[2022-08-24T16:45:35+0200] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/tmp/tmpshgqs59c)
-rwxrw-r-- 1 marcel marcel 176 2022-08-24 16:45:31.661484745 +0200 runtime.log
runtime.outdir: /tmp/e87d25b825ff59bea6076e0a37466039/a40c/506d/tmppdwsq8ns/tmp-out8y77mv7i
runtime.tmpdir: /tmp/e87d25b825ff59bea6076e0a37466039/a40c/506d/tmpwu64ieuihece8e1w

As you can see from the timestamp of the output file runtime.log, but also from its contents, it is not overwritten in the second run.

┆Issue is synchronized with this Jira Story ┆friendlyId: TOIL-1209

unito-bot commented 2 years ago

@gmloose can you check if this behavior also happens in cwltool?

gmloose commented 2 years ago

No, with CWLTool it runs fine. Here's the output of exactly the same command, with toil-cwl-runner replaced by cwltool:

$ for i in `seq 1 2`; do cwltool runtime.cwl && ls -l --full-time runtime.log && cat runtime.log; done
INFO /project/rapthor/Software/rapthor/bin/cwltool 3.1.20220628170238
INFO Resolved 'runtime.cwl' to 'file:///home/marcel/temp/cwl/runtime.cwl'
INFO [job runtime.cwl] /tmp/8rvkoava$ python3 \
    runtime.py > /tmp/8rvkoava/runtime.log
INFO [job runtime.cwl] completed success
{
    "logfile": [
        {
            "location": "file:///home/marcel/temp/cwl/runtime.log",
            "basename": "runtime.log",
            "class": "File",
            "checksum": "sha1$5d9963e21c2c4ac18946fb0a63a5571a225fa179",
            "size": 60,
            "path": "/home/marcel/temp/cwl/runtime.log"
        }
    ]
}
INFO Final process status is success
-rw-rw-r-- 1 marcel marcel 60 2022-08-25 11:03:34.295831565 +0200 runtime.log
runtime.outdir: /tmp/8rvkoava
runtime.tmpdir: /tmp/nn5wc6w3
INFO /project/rapthor/Software/rapthor/bin/cwltool 3.1.20220628170238
INFO Resolved 'runtime.cwl' to 'file:///home/marcel/temp/cwl/runtime.cwl'
INFO [job runtime.cwl] /tmp/ayuayyot$ python3 \
    runtime.py > /tmp/ayuayyot/runtime.log
INFO [job runtime.cwl] completed success
{
    "logfile": [
        {
            "location": "file:///home/marcel/temp/cwl/runtime.log",
            "basename": "runtime.log",
            "class": "File",
            "checksum": "sha1$67848aa15460569ec534e5eff20e5b10ba3e40c7",
            "size": 60,
            "path": "/home/marcel/temp/cwl/runtime.log"
        }
    ]
}
INFO Final process status is success
-rw-rw-r-- 1 marcel marcel 60 2022-08-25 11:03:41.739847754 +0200 runtime.log
runtime.outdir: /tmp/ayuayyot
runtime.tmpdir: /tmp/kfy85tgs