Closed evan-wehi closed 7 years ago
@evan-wehi As a workaround, what happens if adapters
becomes a workflow level input with a default value? That would be more portable anyhow regardless.
@mr-c I’ll try that and let you know. I agree this is not portable and should not be part of the workflow. But also it should not be part of the input. I’d like to isolate references and such in something like a site-config.yml but I haven’t had a chance to figure out how to do that in CWL.
@evan-wehi I understand. This is where we get into the differences between "run a workflow" and "be a comprehensive workflow platform" start to become apparent. Toil is a great workflow runner, but doesn't (yet) try to be a comprehensive workflow platform.
@mr-c
I've just spent two frustrating hours trying to move the adapter location into the input. Seems like it should be trivial but I can't figure it out.
Here is what I have:
in my input:
adaptersx:
class: File
path: "/stornext/System/data/apps/trimmomatic/trimmomatic-0.36/adapters/TruSeq3-PE.fa"
In the worflow input section:
inputs:
read1: File
read2: File
adaptersx: File
mouse-ref: string
human-ref: string
The step itself:
trim:
run: tools/trimmomatic.cwl
in:
reads1:
source: read1
valueFrom: >
${
self.format = "http://edamontology.org/format_1930";
return self;
}
reads2:
source: read2
valueFrom: >
${
self.format = "http://edamontology.org/format_1930";
return self;
}
end_mode:
default: PE
nthreads:
valueFrom: ${ return 2; }
illuminaClip:
default:
adapters: adaptersx
seedMismatches: 1
palindromeClipThreshold: 20
simpleClipThreshold: 20
minAdapterLength: 4
keepBothReads: "true"
out: [output_log, reads1_trimmed, reads1_trimmed_unpaired, reads2_trimmed_paired, reads2_trimmed_unpaired]
trimmomatic.cwl is essentially the same as the one in the user contribution repo. The reference implementation reports:
$ cwltool pdx-pl.cwl pdx-inp.yml
/home/thomas.e/.local/bin/cwltool 1.0.20170413194156
Resolved 'pdx-pl.cwl' to 'file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/test/pdx-pl.cwl'
No handlers could be found for logger "rdflib.term"
Exception on step 'trim'
[step trim] Cannot make job: Invalid job input record:
the `illuminaClip` field is not valid because
tried illuminaClipping but
pdx-pl.cwl:134:11: the `adapters` field is not valid because
pdx-pl.cwl:134:11: is not a dict
[workflow pdx-pl.cwl] outdir is /home/thomas.e/tmp/tmprfKB6K
{}
Final process status is permanentFail
@evan-wehi Upon further reflection, we don't have a way to create a custom record that references a particular input in CWL v1.0.1.
I've created an issue to track this at https://github.com/common-workflow-language/common-workflow-language/issues/493
Here's how to work around it via an ExpressionTool
using this Trimmomatic CWL description
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements:
SchemaDefRequirement:
types:
- $import: trimmomatic-illumina_clipping.yaml
inputs:
adapters:
type: File
default:
class: File
path: /stornext/System/data/apps/trimmomatic/trimmomatic-0.36/adapters/TruSeq3-PE.fa
steps:
package_Illumina_clipping:
in: { illumina_adapters: adapters }
run:
class: ExpressionTool
requirements: { InlineJavascriptRequirement: {} }
inputs: { illumina_adapters: File }
expression: |
${ var illuminaClip = {
"adapters": inputs.illumina_adapters,
"seedMismatches": 1,
"palindromeClipThreshold": 20,
"simpleClipThreshold": 20,
"minAdapterLength": 4,
"keepBothReads": "true" };
return { "illuminaClip": illuminaClip };
}
outputs: { illuminaClip: trimmomatic-illumina_clipping.yaml#illuminaClipping }
out: [ illuminaClip ]
outputs:
illuminaClip:
type: trimmomatic-illumina_clipping.yaml#illuminaClipping
outputSource: package_Illumina_clipping/illuminaClip
Not a very satisfactory solution, but that should work.
@mr-c Thanks, I’ll try and give it a whirl tomorrow. Unfortunately, I can’t install working cwl reference implementation at the moment. (Issue opened in that repository)
Evan.
@mr-c After several more hours I did get the trimmomatic step working. The workflow executes to completion with the reference implementation and the trimmomatic step executes with toil. I can't help wondering (without understanding the CWL design) why, essentially transporting a string to a command line, needs to be this complicated.
I know have this problem:
Failed to evaluate expression:
Expression evaluation error:
evalmachine.<anonymous>:61
ret["secondaryFiles"] = [inputs.bamIndex];
^
TypeError: Cannot set property 'secondaryFiles' of undefined
at evalmachine.<anonymous>:61:25
at evalmachine.<anonymous>:63:3
at ContextifyScript.Script.runInContext (vm.js:35:29)
at ContextifyScript.Script.runInNewContext (vm.js:41:15)
at Object.runInNewContext (vm.js:93:38)
at Socket.<anonymous> ([eval]:10:55)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:176:18)
at Socket.Readable.push (_stream_readable.js:134:10)
This expression appears in this step:
#
# Gather human bam and index into one dependency
#
gather:
run:
class: ExpressionTool
inputs:
bamFile:
type: File
bamIndex:
type: File
outputs:
combined:
type: File
expression: >
${
var ret = inputs.bamFile;
ret["secondaryFiles"] = [inputs.bamIndex];
return {"combined" : ret};
}
in:
bamFile:
source: sort-human/sorted
bamIndex:
source: index-human/index
out: [combined]
which is a hack I picked up somewhere to make sure that a bam index travels along with the original bam file (the reference implementation requires to execute correctly).
Any insight would be much appreciated.
Hey @evan-wehi
Thank you for your patience. Running trimmomatic is essentially like configuring another workflow system, but from the command line. As a tool it is an outlier and the source of much frustration across all workflow systems -- the Galaxy wrapper is 500 lines, for example: https://github.com/fls-bioinformatics-core/galaxy-tools/blob/master/tools/trimmomatic/trimmomatic.xml
That's an interesting hack. A simpler method is to output the BAM with the index as a secondaryFile in the step where the index is created: https://github.com/common-workflow-language/workflows/blob/master/tools/samtools-index.cwl#L23
@mr-c
Several more hours and this bit is now work and I have in fact actually managed to get toil run the workflow to completion on some test data. Thank you for your help! I do dispute the claim that trimmomatic is an outlier. I find that everything in CWL is difficult and highly non-obvious.
One last thing (for this thread). The JS to build the illuminaClip object needs to be
var illuminaClip = {
"adapters": inputs.adapters,
"seedMismatches": 1,
"palindromeClipThreshold": 20,
"simpleClipThreshold": 20,
"minAdapterLength": 4,
"keepBothReads": true };
return { "illuminaClip": illuminaClip };
for toil but
var illuminaClip = {
"adapters": inputs.illumima_adapters,
"seedMismatches": 1,
"palindromeClipThreshold": 20,
"simpleClipThreshold": 20,
"minAdapterLength": 4,
"keepBothReads": true };
return { "illuminaClip": illuminaClip };
for the reference implementation. It would be nice if they could be the same.
@evan-wehi I am truly sorry to hear that you've had such a frustrating time with CWL, the CWL reference implementation, and Toil.
The only difference in your JS is
adapters": inputs.adapters,
vs "adapters": inputs.illumima_adapters,
, this suggests that there were other differences in your workflows.
I forget, did we try putting the JS expression in the valueFrom
field? That works for me locally, I think this would be the equivalent code for you:
trim:
run: tools/trimmomatic.cwl
in:
reads1:
source: read1
valueFrom: >
${
self.format = "http://edamontology.org/format_1930";
return self;
}
reads2:
source: read2
valueFrom: >
${
self.format = "http://edamontology.org/format_1930";
return self;
}
end_mode:
default: PE
nthreads:
valueFrom: ${ return 2; }
illuminaClip:
source: adaptersx
valueFrom: |
${ return {
"adapters": self,
"seedMismatches": 1,
"palindromeClipThreshold": 20,
"simpleClipThreshold": 20,
"minAdapterLength": 4,
"keepBothReads": true };
}
out: [output_log, reads1_trimmed, reads1_trimmed_unpaired, reads2_trimmed_paired, reads2_trimmed_unpaired]
You'll need to make sure the Workflow level requirements
has something like this
requirements:
SchemaDefRequirement:
types:
- $import: trimmomatic-illumina_clipping.yaml
@mr-c That is a far simpler solution and possibly even consistent with my slowly developing understanding of how CWL works. It also works with both the reference implementation and toil.
Thanks!
This is the stacktrace:
This is the failing workflow step:
Using print statements I know it is failing on the adapters file - there is no fileStoreId for this file because it is not in the filestore.
The workflow works with the CWL reference implementation.