Closed abearab closed 8 months ago
This should be described in the --help
text, there is an example at the bottom:
Sets the number of parallel instances of Bismark to be run concurrently.
This forks the Bismark alignment step very early on so that each individual Spawn of Bismark processes only every n-th sequence (n being set by
--parallel
). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to
--parallel 8
tested). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie2/HISAT2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel is resource hungry! Each value of --parallel specified will effectively lead to a linear increase in compute and memory requirements, so--parallel 4
for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned.
Does this answer your questions?
WARNING: Bismark Parallel is resource hungry!
I have faced this for sure. Yes, --parallel 8
helped to avoid that – as suggested.
Also, I used another server to finish all my samples asap and it seems to be crashed in the middle of the run and ended up with a bam file reported to have 50%
mapping efficiency. I re-mapped the same sample in the main server I have access (everything has been working fine there) and it ended up with 80%
mapping efficiency. I thought maybe this should be considered that a crashed file should stop the job rather than going up to the very end and merging them as an incorrect final file. I can imagine this might be hard to fix, but just wanted to share my experience here.
Regardless of that, I'm almost done with this set of samples I'm processing, thanks for the feedbacks here :)
(closing this issue here)
Hi @FelixKrueger – I'm trying to limit the memory and CPU usage to make sure other labmates can use our shared server and also my job runs at a reasonable speed. I have two questions / concerns:
--parallel
option is not actually limiting the number of cores used by bowtie?! I do see that more cores than8
used!bismark
commandFASTQ file sizes