The COWorkflowsBasePlugin (Computational Oncology Workflows Base Plugin for Roddy) provides some general classes and framework for some of the other Roddy plugins. This includes both, JVM-based code (Java, Groovy) as well as command line tools used in cluster jobs.
The alignment folder is referenced several times. For the plugin to work, it is currently necessary to have a folder for your dataset like e.g.:
/tmp/[dataset_id]
Inside this, you will need to create the alignment subfolder:
/tmp/[dataset_id]/alignment
And inside this, you may have to to place or link your merged bams (dependent on the workflow), e.g.:
/tmp/[dataset_id]/alignment/[sample_id]_[dataset_id]_merged.rmdup.bam
/tmp/[dataset_id]/alignment/[sample_id]_[dataset_id]_merged.rmdup.bam.bai
It should be possible to just link the files in there.
So whenever we speak of the alignment folder, it is basically the described structure. You can change the alignment folder by overriding in your xml:
<cvalue name='alignmentOutputDirectory' value='alignment' type="path"/>
Switch | Default | Description |
---|---|---|
isNoControlWorkflow | false | Set to true to allow this workflow to work without a control bam file. |
workflowSupportsMultiTumorSamples | false | Allow the workflow to run with several tumor bam files. This is done with a for loop (see code documentation in WorkflowUsingMergedBam) |
To extract samples from filenames, multiple methods exist or are planned. You can control the workflows behaviour with the variable "selectSampleExtractionMethod".
Valid values with their control variables are:
Switch | Value | Description |
---|---|---|
selectSampleExtractionMethod | version_1 (Default) | The old version for sample from file extraction. |
selectSampleExtractionMethod | version_2 | The new version. |
This one is very (too) simple and just splits the filename on underscores '_'. Afterwards, it takes the first splitted value and uses it as the sample name. Further control is possible with:
Switch | Default | Description |
---|---|---|
enforceAtomicSampleName | false | Defines whether the method shall append '' to the search pattern. The method searches then e.g. for 'control' or 'tumorsomething' |
Please take a close look at the file SampleFromFilenameExtractorVersionOneTest
to see a table of filenames and expected samples.
Note that, in contrast to version2, this method does not take the configured samples in possibleControlSampleNamePrefixes
and possibleTumorSampleNamePrefixes
into account and will return any file prefix separated by "". So you should not have underscores in your sample names.
The method is quite complex and can detect a variety of samples. The basic settings will use the samples set in
possibleControlSampleNamePrefixes
and possibleTumorSampleNamePrefixes
as prefixes for the sample search. E.g.
"con" will extract "control" from "control_some_merged.bam" and "control02" from "control02_some_merged.bam". Like in
version1, "\" is used as a delimiter for the extraction. Note that, in contrast to version1, samples may contain "\"
delimiters in their name! A sample prefix like "control_sample" will work.
Before the sample is extracted, both possible...
lists are joined and sorted in a reverse order. Let's say you have:
possibleControlSampleNamePrefixes=( control control02 control_sample )
possibleTumorSampleNamePrefixes=( tumor xeno tumor_02 )
you will get the following list for the extraction:
xeno
tumor_02
tumor
control_sample
control02
control
We do this to search for the most specific sample prefix first, otherwise in the case above, control would be preferred over the more specific control_sample or control02.
You can modify the search behaviour with several switches:
Switch | Default | Description |
---|---|---|
matchExactSampleNames | false | If set, the sample will be extracted like they are set in the config. This is compatible with allowSampleTerminationWithIndex. |
allowSampleTerminationWithIndex | true | Allow recognition of trailing integer numbers for sample names, where the index may be separated by an underscore from the prefix, e.g. both "tumor02" and "tumor_02" would be matched with "possibleTumorSampleNamePrefixes=tumor". |
useLowerCaseFilenamesForSampleExtraction | true | The switch will tell the method to work on lowercase filenames. Filenames are first converted to lower case before matching. |
Please take a close look at the file SampleFromFilenameExtractorVersionTwoTest
. There is a large test case "Version_2: Extract sample name from BAM basename", which features a table with inputs, switches and expected output.
matchExactSampleName=false
allowSampleTerminationWithIndex=true
useLowerCaseFilenameForSampleExtraction=true
Note that these are the default settings for the version_2 algorithm.
If you want just exact matching to the names in the possible(Tumor|Control)SampleNamePrefixes
you can use
matchExactSampleName=true
allowSampleTerminationWithIndex=false
useLowerCaseFilenameForSampleExtraction=false
Also note that there is a variable calle searchMergedBamWithSeparator
, which defaults to "true".
<cvalue name='searchMergedBamWithSeparator' value='true' type="boolean"/>
It determines whether the sample-name is separated from the patient identifier with an underscore "_". Leave this value set to "true" also with matchExactSampleNames
, because otherwise you could still find more than one BAM file when they share the same prefix (e.g. "tumor" was extracted but it will match for "tumor" and "tumor03" during the BAM file search.
Not implemented, but planned.
Switch | Default | Description |
---|---|---|
extractSamplesFromOutputFiles | false | If this is true and samples are neither passed by metadata table, configuration or sample list, samples are extracted from files in the alignment folder. |
extractSampleNameOnlyFromBamFiles | false | By default, the method will search for samples in all files in the alignment directory. With this switch, you can restrict it to BAM files. |
Version update to 1.4.2
Version update to 1.4.1
Version update to 1.4.0
Version update to 1.3.0
Version update to 1.2.1
Version update to 1.2.0
Version update to 1.1.1
Version update to 1.1.0
Version update to 1.0.3
Version update to 1.0.2
Version update to 1.0.1
Version update to 1.0.0
Version update to 1.1.20
The original code came from the COWorkflows plugin.