Closed leowill01 closed 8 months ago
I haven't noticed this, but I also haven't paid close attention.
It seems like there might be some discussion of something related over on the bowtie2 issues...
thanks for the find! seems that issue has been open quite a while. have there ever been any plans to incorporate a choice for the aligner (eg opting to use bwa-mem2 instead of bowtie2)?
ill keep testing to see if its a problem stemming from elsewhere like within parallel
.
It would be very difficult to substitute another aligner and get full breseq functionality.
In particular, the junction prediction steps require finding split read matches and breseq tracks all equivalent locations to which a read aligns. Not all aligners are good at doing these things. Most are optimized for finding the bast match and/or randomly assign a read to one equivalent location.
There is an option to use your own aligned SAM files of reads as input to breseq(--aligned-sam
), in which case it will skip the alignment steps. But, it can't call JC evidence in this case, so you might as well use any other SNP / small indel calling program in this case. So, I wouldn't recommend going down that road.
$ breseq -h
...
--aligned-sam Input files are aligned SAM files, rather than FASTQ
files. Junction prediction steps will be skipped. Be
aware that breseq assumes: (1) Your SAM file is
sorted such that all alignments for a given read are
on consecutive lines. You can use 'samtools sort -n'
if you are not sure that this is true for the output
of your alignment program. (2) You EITHER have
alignment scores as additional SAM fields with the
form 'AS:i:n', where n is a positive integer and
higher values indicate a better alignment OR it
defaults to calculating an alignment score that is
equal to the number of bases in the read minus the
number of inserted bases, deleted bases, and soft
clipped bases in the alignment to the reference. The
default highly penalizes split-read matches (with
CIGAR strings such as M35D303M65).
I would have thought that disk read/write would be more limiting if you launch many breseq runs that hit the bowtie2 alignment step at the same time.
I'm running multiple calls to breseq using GNU
parallel
but i only allow each breseq call to use 1 cpu core with-j 1
. however when looking at my process monitor i see that whenever breseq calls a subprocess step forbowtie2
it uses more than 1 cpu core:ive logged this as
bowtie2
using 200% CPU (ie 2 cores) when-j 1
and 300% cpu (3 cores) when-j 2
. interestingly, in the breseq output, it shows that every call tobowtie2
is called with-p 1
so im not sure why it would be trying to use more than 1 core.this is causing problems when trying to efficiently schedule cores/job using
parallel
with my scripts because i assume that 1 core = 1 job, however whenbreseq
/bowtie2
uses more than 1 core, this has been causing problems with CPU overhead and clogging up the threads.anyone come across this before?