Closed jsmedmar closed 4 years ago
If you look at the underlying commands you will find that the trailing ends are reading from a pipe. Yes, they could be split further, however you'd add a round trip of the data to disk.
stats
is a no-op step (other than some cleanup) since a recent switch around, but is left in place for backwards-compat.
Ok Keiran, thanks so much for the clarification. Any clue why the mark
step seems to use exactly half of the -threads
passed?
The marking step (many items in samtools) will only use threads for compression. We don't run single lanes with more than 8-10 threads for any step, and when bwa-mem2 is finally stable we'll probably drop to 6.
This is the CPU utilization profile of
bwa_mem.pl
aligning a180
coverage genome from a BAM file using-bm2
on a 64 cpus machine, using-threads 64
for all steps.My questions are:
bwamem
: by8:00
thebwamem
step stops using the requested resources for about 3.5h. I was wondering if its worth splitting this process into two?mark
: same question, is it worth splitting themark
step into two process?I'm now running
bwa_mem.pl
using the-process
and-index
flags which facilitates requesting the right amount of compute resources per step, but wanted to check on these two points.The actual running stats: