Open oneillkza opened 6 years ago
Thanks, @mr-c . I'm still trying to figure out how to scatter across the contents of a directory, though. (ie I have a directory full of input files, and I want to apply the preprocessing workflow to every one of them).
I haven't yet been able to find an example of that anywhere. I have seen some hints that having an input of type Directory and using inputs.inputdir.listing should work, but I'm not having any luck yet.
Ah, this is because scatter
runs before valueFrom
-- see the discussion at https://github.com/common-workflow-language/common-workflow-language/issues/419
As a workaround, add a step to turn the directory into an array of Files (discarding any subdirectories) or change your inputs to be an array of Files outright.
Here's an example step with an inline ExpressionTool to convert a Directory to an array of Files:
directory_to_array:
in: { directory: some_step/some_directory }
run:
class: ExpressionTool
requirements: { InlineJavascriptRequirement: {} }
inputs: { directory: Directory }
expression: |
${ var i, len = inputs.directory.listing.length;
for (i = len - 1; i >= 0; i--) {
if (inputs.directory.listing[i].class != 'File') {
inputs.directory.listing.splice(i, 1);
}
}
return { "array_of_files": inputs.directory.listing };
}
outputs:
array_of_files: File[]
out: [ array_of_files ]
or as a slightly more verbose external tool for reuse:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool
requirements:
InlineJavascriptRequirement: {}
label: Convert a Director to an array of Files, skipping subfolders
inputs:
directory:
type: Directory
expression: |
${
var i, len = inputs.directory.listing.length;
for (i = len - 1; i >= 0; i--) {
if (inputs.directory.listing[i].class != 'File') {
inputs.directory.listing.splice(i, 1);
}
}
return { "array_of_files": inputs.directory.listing };
}
outputs:
array_of_files:
type: File[]
Here's the same tool in a self-contained workflow and using scatter
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
requirements: { ScatterFeatureRequirement: {} }
inputs: { dir: Directory }
outputs:
names:
type: string[]
outputSource: list_array/basename
steps:
directory_to_array:
in: { directory: dir}
run:
class: ExpressionTool
requirements: { InlineJavascriptRequirement: {} }
inputs: { directory: Directory }
expression: |
${ var i, len = inputs.directory.listing.length;
for (i = len - 1; i >= 0; i--) {
if (inputs.directory.listing[i].class != 'File') {
inputs.directory.listing.splice(i, 1);
}
}
return { "array_of_files": inputs.directory.listing };
}
outputs:
array_of_files: File[]
out: [ array_of_files ]
list_array:
in: { file: directory_to_array/array_of_files }
run:
class: ExpressionTool
requirements: { InlineJavascriptRequirement: {} }
inputs: { file: File }
expression: |
${return { "basename": inputs.file.basename };}
outputs: { basename: string }
out: [ basename ]
scatter: file
Estimating this as a 3
For a real data set, we need to run preProcess across every library, then gather them together to run pairwise-distance and heatmap (and various other tasks, eg pooling methylation then generating BigWIGs from that).
@mr-c suggested taking a look at https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl for examples of the scatter and gather functionality in CWL.