ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

how to pass-through extra parameters to modules from top-level cf command? #79

Closed avilella closed 8 years ago

avilella commented 8 years ago

This may already be explained somewhere, but I want to make sure I am using a clean way for this.

I am running a clusterflow pipeline which contains a module for one of the steps where, depending on each sample folder containing the *.fastq.gz files, one wants to use different --trimtype val --trimlen length for such module. For example:

cd ./folder1
cf --genome human --trimtype R1 --trimlen 6 *.fastq.gz
cd ../folder2
cf --genome human --trimtype R1R2 --trimlen 12 *.fastq.gz

What is the best way to pass through these two extra options to the pipeline and the module within the pipeline so that folder1 is executed in one way and folder2 is executed in the other way?

Thx

ewels commented 8 years ago

You want to pass these as parameters using the --param command line option (or by having different pipeline configs, where you can specify the param after the module name).

The Trim Galore! is already set up to recognise a bunch of different trimming parameters. I've named them so far, see here:


my $clip_r1 = "";
my $clip_r2 = "";
if(defined($cf{'params'}{'pbat'})){
    $clip_r1 = "--clip_r1 4";
    $clip_r2 = "--clip_r2 4";
}
if(defined($cf{'params'}{'single_cell'})){
    $clip_r1 = "--clip_r1 9";
    $clip_r2 = "--clip_r2 9";
}
if(defined($cf{'params'}{'epignome'})){
    $clip_r1 = "--clip_r1 6 --three_prime_clip_r1 6";
    $clip_r2 = "--clip_r2 6 --three_prime_clip_r2 6";
}
if(defined($cf{'params'}{'accel'})){
    $clip_r1 = "--clip_r1 10 --three_prime_clip_r1 10";
    $clip_r2 = "--clip_r2 15 --three_prime_clip_r2 10";
}
if(defined($cf{'params'}{'cegx'})){
    $clip_r1 = "--clip_r1 6 --three_prime_clip_r1 2";
    $clip_r2 = "--clip_r2 6 --three_prime_clip_r2 2";
}

But there's nothing special about the strings that are passed. So if you really want it to be variable on the cluster flow command line, you could have another if() statement that checks for a substring, then splits by a special character. For example: (untested)

cf --genome GRCh37 --param trimR1_6 --param trimR2_12 fastq_bismark  *.fastq.gz

or a new fastq_bismark_trim1.config:

/*
[...]
#fastqc
#trim_galore    trimR1_6 trimR2_12
    #bismark_align
[..]

With a modified Trim Galore module to accept them:

foreach my $param (@{$cf{'params'}}){
    if(substr($param, 0, 'trimR1_'){
        $clip_r1 = "--clip_r1 ".substr($param, 7);
    }
    if(substr($param, 0, 'trimR2_'){
        $clip_r2 = "--clip_r2 ".substr($param, 7);
    }
}

See the docs on this subject here (not much, should probably write more).