BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
638 stars 160 forks source link

Alignment performance decreases with higher `-p` option value #478

Open LawrenceLiu023 opened 1 month ago

LawrenceLiu023 commented 1 month ago

I am using Bismark to conduct alignment of methylation NGS data. The alignment engine used is bowtie2. There are 2 options in Bismark related to multi-core processing: -parallel determines how many bowtie2 instances is launched simultaneously and -p is the same as the -p option of Bowtie2.

I randomly sampled 100,000 and 1,000,000 pairs of reads from my fastq.gz file, and tried different settings of -parallel and -p options. The results seem to suggest that when -p is higher than 3, the higher -p is, the more time will be spent on alignment. I wonder if there is an optimal -p option setting, and whether this is a normal phenomenon. The test results are as follows:

image image

CPU: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 24cores Memory: 32GB System: Linux 91d674d7009b 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 GNU/Linux Bowtie2 version: bowtie2-align-s version 2.4.4 64-bit Bismark version: v0.24.2

sfiligoi commented 1 week ago

@LawrenceLiu023 That's not my experience. Can you check that your process indeed has access to all the cores? E.g. by using

taskset -pc <PID>
LawrenceLiu023 commented 1 week ago

Thank you for your response. I have carefully observed the CPU usage during my testing and found that when Bismark calls the bowtie2 genome alignment engine for methylation sequencing data alignment, it uses 2 bowtie2 instances simultaneously. When I set the -p parameter of bowtie2 to a number n, I can observe that the number of bowtie2 processes is indeed 2n, and the CPU utilization is also 2n*100%.

However, in my tests on both a 24-core server and a 32-core server, when the -p parameter is set to n>4, the speed does not increase significantly, and even starts to decrease.

I would like to ask, what -p parameter do you typically use for bowtie2? Have you also observed a clear performance improvement as you increase the -p parameter?

I also found a post that mentioned a similar performance issue, where the -p parameter seems to not bring much performance improvement after a certain value. The link is: https://www.biostars.org/p/92366/.

sfiligoi commented 1 week ago

I saw good scalability to -p 16. Note your CPU has only 12 CPU cores (x2 HT), so that could explain why the time grows from that point on. Bowtie2 is also (mostly) memory bound, so scalability is know to be limited more by memory bandwidth than compute core TOPS.

That said, I see you are using an ancient version of bowtie2 (2.4.4). There have been significant memory access improvements in 2.5.0, which should help in your case. I would recommend you try the latest version.

PS: I will try to run a few benchmarks on my system and post the detailed results.

LawrenceLiu023 commented 1 week ago

Thank you for the additional information. I suspect that hyper-threading could indeed be a factor contributing to the performance plateau.

I installed bowtie2 using apt-get install bowtie2, so the version I have installed is the outdated 2.4.4 release. I will install the latest version and run some more benchmarks.

sfiligoi commented 1 week ago

Here are a few data points for my 5M reads run using WoLr1 as the reference database:

NTHREADS Runtime
 2      13:25 mins
 4       6:47 mins
 8       3:29 mins
12       2:23 mins
16       2:08 mins

My CPU is AMD EPYC 7302 16-Core Processor

The scaling does slow down close to the max, but it is almost linear up to -p 12 .

For completeness, bowtie2 v 2.5 and the command used is

$ /bin/time taskset -c 0-15 ./bowtie2 --no-exact-upfront --no-1mm-upfront -p${NTHREADS} -x /scratch/qp-woltka/WoLr1/WoLr1 -q ${INFILE} -S ${OUTFILE} --seed 42 --very-sensitive -k 16 --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min  "L,0,-0.05" --no-head --no-unal

4949790 reads; of these:
  4949790 (100.00%) were unpaired; of these:
    1493707 (30.18%) aligned 0 times
    1852338 (37.42%) aligned exactly 1 time
    1603745 (32.40%) aligned >1 times
69.82% overall alignment rate