Open adrabent opened 1 year ago
Issue still persists. Is there any update on that issue?
Cheers, Alex
Hey Alex; are you using cwltool --parallel
or other options?
Dear @mr-c,
yes I call the workflow like this:
cwltool --parallel --outdir $PWD/results/ --tmpdir-prefix $PWD/tmp/ <workflow> <input_json>
By workflow definition the step that complains can only run IF the step where the "reader" is active has finished completely. So there is no way that there may be a conflict (step is read-only anyways, otherwise this behaviour would be expected).
Cheers, Alex
I think the issue comes down to the following:
input
is used by step 1 to produce output output
. Step 1 does not change or modify input
.input
and output
are then used by step 2. Step 2 does modify input
through InplaceUpdateRequirement
.input
.The CWL specification mentions that
Downstream steps which further process the file must use the output of previous steps, and not refer to a common input (this is necessary for both ordering and correctness).
There is no possibility for step 2 to be run before step 1: the former depends on the output of the latter. Step 1 is therefore upstream from step 2. Therefore, it isn't clear why this workflow fails. As an explicit example see the workflow below. In this example, step1 is extract_text
and step2 is change_file
.
cwlVersion: v1.2
class: Workflow
inputs: []
outputs: []
steps:
create_file:
in: []
out: [file]
run:
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [echo, "text"]
stdout: file.txt
inputs: []
outputs:
file: stdout
extract_text:
in:
input:
source: create_file/file
out: [text]
run:
cwlVersion: v1.2
class: CommandLineTool
baseCommand: cat
stdout: text.txt
inputs:
input:
type: File
inputBinding:
position: 0
outputs:
text:
type: string
outputBinding:
glob: text.txt
loadContents: true
outputEval: $(self[0].contents)
change_file:
in:
input:
source: create_file/file
text:
source: extract_text/text
out: []
run:
cwlVersion: v1.2
class: CommandLineTool
baseCommand: mv
inputs:
input:
type: File
inputBinding:
position: 0
text:
type: string
inputBinding:
position: 1
outputs:
output:
type: File
outputBinding:
glob: $(inputs.text)
requirements:
InplaceUpdateRequirement:
inplaceUpdate: true
InitialWorkDirRequirement:
listing:
- entry: $(inputs.input)
writable: true
Running this gives me
$ cwltool test.cwl
INFO /home/tjlv53/.local/bin/cwltool 3.1.20240708091337
INFO Resolved 'test.cwl' to 'file:///home/tjlv53/test.cwl'
INFO [workflow ] start
INFO [workflow ] starting step create_file
INFO [step create_file] start
INFO [job create_file] /tmp/u_5hcdt_$ echo \
test > /tmp/u_5hcdt_/file.txt
INFO [job create_file] completed success
INFO [step create_file] completed success
INFO [workflow ] starting step extract_text
INFO [step extract_text] start
INFO [job extract_text] /tmp/wkuwqckd$ cat \
/tmp/wkuwqckd/file.txt > /tmp/wkuwqckd/text.txt
INFO [job extract_text] completed success
INFO [step extract_text] completed success
INFO [workflow ] starting step change_file
INFO [step change_file] start
ERROR Exception on step 'change_file'
ERROR [step change_file] Cannot make job: [job change_file] wants to modify file:///tmp/u_5hcdt_/file.txt from generation 0 but current generation is 1 (last updated by extract_text)
INFO [workflow ] completed permanentFail
{}WARNING Final process status is permanentFail
What if you take have the input of the step1 pass through so it is also an (unmodified) output of step1, and then use both outputs of step1 in step2?
In my example that works if I add an InitialWorkdirRequirement
for it. But that makes the workflow more complicated than it needs to be, and at least to me it isn't clear from the specification why this is necessary. There is no chance of step 2 being executed before step 1.
The correctness checking (that you are not potentially writing to a file that could be read by a different step at the same time) is very simple. It checks that either there's exactly one step connected for update and no other steps, or that all the steps are connected for read only. It does not take implied step sequence into account. The algorithm would need to be improved.
Expected Behavior
cwltool should pass and not through an error
Actual Behavior
cwltool throws an error
Workflow Code
And this is the subworkflow:
Steps for reference: compare_station_list.cwl dp3_addcol.cwl
Full Traceback
Your Environment