bismark aligning comma-separated list of fastq files stops after first sample finished

Dear colleagues,

I am re-running a WGBS pipeline to see how well it can be replicated with the partial code i have.

I am no stuck but I want the code to be more efficient and not wait for me until i always initiate to continue with the next sample(pair) after one sample was aligned. So I use following script:

_echo "Bismark aligning" input_files_1="" input_files_2="" for file in fastq/trim/_R1_001_val_1.fq.gz; do input_files_1+="${file}," done for file in fastq/trim/_R2_001_val_2.fq.gz; do input_files_2+="${file}," done input_files_1=${input_files_1%,} # Remove the trailing comma input_files_2=${input_files_2%,}

bismark --genome ~/bioinformatics/ref_genomes/mouse_38/genome \ -1 "${input_files_1}" -2 "${input_files_2}" \ -o BAM/prededuplicate/ --temp_dir BAM/ \ --parallel 3 -q --scoremin L,0,-0.2 --maxins 500

the input_files_1 variable would then have following sample names saved (comma separated as requested in the bismark --help): _fastq/trim/Ctrl-1_R1_001_val_1.fq.gz,fastq/trim/Ctrl-2_R1_001_val_1.fq.gz,fastq/trim/F1-1_R1_001_val_1.fq.gz,fastq/trim/F1-2_R1_001_val1.fq.gz

according to the first lines after starting the alignment everything seems to be fine as all fastq files were detected: _Input files to be analysed (in current folder '/home/chuddy/bioinformatics/lamarck-project'): fastq/trim/Ctrl-1_R1_001_val_1.fq.gz fastq/trim/Ctrl-1_R2_001_val_2.fq.gz fastq/trim/Ctrl-2_R1_001_val_1.fq.gz fastq/trim/Ctrl-2_R2_001_val_2.fq.gz fastq/trim/F1-1_R1_001_val_1.fq.gz fastq/trim/F1-1_R2_001_val_2.fq.gz fastq/trim/F1-2_R1_001_val_1.fq.gz fastq/trim/F1-2_R2_001_val2.fq.gz Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)

After 887 minutes of running time, i received a bam file, which looked okay, also according the detection of C in CpG context, etc.

What did I do wrong, since normally the alignment of the second sample should start immediately after the first finished? Since 887 minutes is a long time, I wonder how i can speed things up? I have difficulties estimating what my mobile workstation is capable of carrying out. I used parallel 3 to be on the save side, although I have 24 CPUs and approx 62 GB of memory. I am working with the mouse genome (mm10, from ensembl).

Bismark Version: v0.24.1 bowties2 version 2.5.1

If anything else is needed to help me, pls tell me so and i will happily deliver.

Best Tom

FelixKrueger / Bismark

bismark aligning comma-separated list of fastq files stops after first sample finished #637