FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
366 stars 101 forks source link

Linear memory growth during alignment. #638

Open iromeo opened 7 months ago

iromeo commented 7 months ago

Hi,

I decided to re-align data (I aligned it ~1-2 years ago) and faced with memory issues. Before memory usage was constant, and now it is linear with time. I'm still investigating the issue & playing with the tools version.

Here is a memory usage screenshot for a large 60x depth WGBS library (short reads, BGI). Memory usage is less than 35 GB in parallel mode with 4 cores:

image

Tools versions:

channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - bowtie2 ==2.4.2
  - bismark ==0.23.0
  - samtools ==1.9

But now it looks quite different for a 32x sequenced library. It grows with time and fails on our cluster, likely due to the max-memory threshold. image

Tools version

channels:
  - conda-forge
  - bioconda
  - nodefaults
dependencies:
  - bismark =0.24.2
  - bowtie2 =2.5.2
  - samtools =1.18

I'm still investigating and want to rollback to older tool versions. But maybe you know why it happens?

FelixKrueger commented 7 months ago

This constant increase in memory memory consumption does indeed look weird. I would be very surprised if this would come from Bismark itself, my guess probably rather Bowtie2 as the main suspect? Can you please post an update if and when you find out more?

iromeo commented 7 months ago

Can you please post

Ok

iromeo commented 7 months ago

Preliminary results, it is indeed bowtie:

image


Also looks like botie2-align-s works in 3 threads/processes in 2.5.2 version and only in 1 thread in 2.4.2

Bowtie '2.5.2' + bismark 0.23.* after 7 hours: 2023-11-13_23-11-53

Bowtie '2.4.2' + bismark 0.24.2 after 12 hours: 2023-11-14_13-50-56