Open iromeo opened 1 year ago
I've checked that Bowtie2 2.4.5
also works ok for me: memory usage is constant. Memory issue appears in 2.5.0
Memory usage in ~14h working
Hello,
To answer your question about the extra threads, as of v2.5.0 bowtie2
moved to an async model for reading and writing data. Each I/O operation uses it's own thread keeping the alignment threads mostly CPU bound.
The output thread does utilise a pair of buffers that have the potential of growing to a maximum of 16GB each. I pushed a change to the bug_fixes branch to cap that growth of the buffer to 16MB. Would you be willing to test that change?
@ch4rr0 yes, I can test it. Should I build the branch from sources?
If you’re able to, yes.
Hello,
Any updates on this?
@ch4rr0 Thx for the reminder. I've just built bowtie2
from the bug fix branch on our computation cluster and will share next week whether it helps or not
@ch4rr0, Everyting works, just tested bowtie2 built from bug_fixes
branch
In bugfixes
branch memory usage is similar to what was in the past and is constant with time
Mem usage is around 4 GB per bowtie2 instance
Thanks for the development and maintenance of Bowtie2, I have been using it for years and I found it a great tool.
Currently I am using Bowtie2 2.5.3 on Ubuntu 22.04 to align a large (~ 670 M reads) unaligned bam file to a custom index consisting of Phix and rRNA sequences only. It is expected that most reads will fail to align as the main purpose here is getting rid of Phix and rRNA reads. In fact in the end I get 5.72% alignment rate. I am observing the same issue with a linear growth of RAM consumption over time. Alignment starts with a tiny memory footprint (the index is ~ 9Mb on HD, the reference fasta is 103 Kb) but after about 30 minutes I get > 20 Gb RAM usage. Looking at the single processes in the system monitor, it seems clear that bowtie2 is taking this RAM, not other commands in the pipe which use only tiny amounts of RAM. I am using a bam input file and piping output to tee to save to different output files aligned and unaligned reads in bam format (using samtools -F 4 / -f 4). I am using the following bowti2 options: -k 1 -p 15 --preserve-tags
bowtie2 --version
/home/valerio/.local/bin/bowtie2-align-s version 2.5.3 64-bit Built on a7ff140d06a4 Wed Jan 17 00:33:13 UTC 2024 Compiler: gcc version 9.3.1 20200408 (Red Hat 9.3.1-2) (GCC) Options: -O3 -msse2 -funroll-loops -g3 -g -O2 -fvisibility=hidden -I/hbb_exe_gc_hardened/include -ffunction-sections -fdata-sections -fstack-protector -D_FORTIFY_SOURCE=2 -fPIE -std=c++11 -Wall -Wno-unused-but-set-variable -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1 -DWITH_ZSTD Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
Thanks for your kind help,
Valerio
@flcvlr in my case the custom build from bowtie2 bug fix branch resolves memory leak. As alternative use 2.4.5 (any version before 2.5.0)
@iromeo Yes, I reported it mainly because according to release notes of v. 2.5.3 (January 2024): [...]
I guess the bug fix branch you compiled and that fixed the issue in your case was merged in v 2.5.3. However, since in my case the issue seems to be still there, maybe the maintainers would like to check whether I run into a specific situation (e.g. reading bam files instead of fastq, or piping output, or using -k 1 or --preserve-tags ) in which the issue is still there even in v 2.5.3. Maybe this is not possible, but I am not into the code of bowtie2 and cannot really tell it.
Hello,
Thank you for reporting this. I am curious if this is similar to @flcvlr's issue or whether this is an issue with the BAM pattern source's interaction with the new async framework. Would you be willing to try out v2.4.5 as suggested by @flcvlr? That should confirm the former.
Same with v 2.4.5. I just let it go for 7 minutes, and it was already taking 7.2 Gb. It started with ~300 Mb and then ramped at ~ 1Gb/minute.
That rules out the async issue then. I pushed a change to the bug_fixes
branch that should do some memory clean up after decompressing BGZF blocks. Would you be willing to test whether this change?
I can confirm that your change fixed the issue for me, I am now running bowtie2 compiled from the bug fixes branch on the same data with a constant RAM usage of ~140 Mb. Thanks!
I'm aligning PE WGBS data using a Bismark wrapper for the bowtie2. Recently, I switched from bowtie2 2.4.2 to the latest 2.5.2 and faced a memory issue.
(P.S: bowtie2 2.4.5 works consumes const memory, the issue was introduced in 2.5.0)
With Bismark2 2.4.2, memory usage was constant, see:
Now with 2.5.2 it grows gradually with time, see:
Bismark launches command line:
I ensured using
top
that memory consumption raises inbotie2-align-s
process with time.Do you have any suggestions or ideas as to why it is happening? I could verify some hypotheses or provide some additional info if needed.
Additionally seems there is a difference in number of subprocesses launched by bowtie,
Also looks like
botie2-align-s
works in 3 threads/processes in2.5.2 version
and only in 1 thread in2.4.2
Bowtie '2.5.2' + bismark
0.23.*
after 7 hours:Bowtie '2.4.2' + bismark
0.24.2
after 12 hours:P.S: Is cause of https://github.com/FelixKrueger/Bismark/issues/638