Bismark Alignment very slow

FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

http://felixkrueger.github.io/Bismark/

GNU General Public License v3.0

394 stars 103 forks source link

Bismark Alignment very slow #670

Closed Nitin123-4 closed 3 months ago

Nitin123-4 commented 6 months ago

Hi Team,

I am running Bismark with the below command. I can see it's really slow.

bismark --bowtie2 -N 1 --parallel 4 $RESOURCES2/HG38/ -1 $Read1 -2 $Read2 --output_dir $PWD --temp_dir $PWD/$SAMPLEID"_TEMP" --prefix $SAMPLEID

I have 218,289,382 total reads i.e. 32.96(Gb) data. I did pre processing using Trimmomatic. Filtered reads are used for this. It took ~22 h to complete. Can you please help with this?

Bismark Version: v0.24.0

FelixKrueger commented 6 months ago

Increasing the the mismatches from the default (0) to -N 1 is probably slowing things down markedly. Lowering this and/or increasing the --parallel are your best options.

Nitin123-4 commented 6 months ago

Thanks for the quick reply.

WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --parallel specified will effectively lead to a linear increase in compute and memory requirements, so --parallel 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned.

With --parallel 4 it is taking ~40GB RAM. If we increase it to --parallel 8 or 10 it will take a lot of memory.

I think this is difficult as it needs more RAM also.

Also Is it recommended to use mismatches (0) ?

FelixKrueger commented 6 months ago

This is from the Bowtie 2 manual:

-N <int>

Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

To be honest, I don't think I have ever changes this to 1 ever. It only really requires one multi-seed alignment in the read somewhere as anchor, which in my experience is pretty much always the case.

Nitin123-4 commented 6 months ago

Okay. Thanks for your response.

Please suggest about : With --parallel 4 it is taking ~40GB RAM. If we increase it to --parallel 8 or 10 it will take a lot of memory.

Is there any solution for this?

FelixKrueger commented 6 months ago

I am afraid there isn't really a solution for this, if you ask for more memory it will also use more... You could potentially try to give each Bowtie2 thread some additional core (e.g. -p 4), but this will only get you so far to be honest. It should not use much more memory though. This could result in the following command line:

bismark -p 4 --parallel 4 $RESOURCES2/HG38/ -1 $Read1 -2 $Read2 --output_dir $PWD --temp_dir $PWD/$SAMPLEID"_TEMP" --prefix $SAMPLEID

Nitin123-4 commented 6 months ago

Okay, so it means it will use 16 CPUs in total and ~40GB RAM?

FelixKrueger commented 6 months ago

If you open a second terminal and run top you should be able to monitor usage statistics

mdsimguy commented 3 months ago

@FelixKrueger Is there a way to put all the temp files into memory? so far I can only get the initial C-T and A-G to be written to /dev/shm/ our clusters have 2TB nvram and that is not being used according to top. The temp.bam files are still being written to a filesystem that is really really slow to read and write to compared to the shared memory.

FelixKrueger commented 3 months ago

I am afraid there isn't any functionality to 'dump' temporary BAM records into memory.

mdsimguy commented 3 months ago

@FelixKrueger Thank you for the quick response. The cluster I am using has 64 cpus across two cores. Is using -p >10 going to help in anyway?

FelixKrueger commented 3 months ago

I personally don't tend to use -p at all, but if you've got some spare why not add it it? You could always run a quick test with different values, e.g -p 2, -p 3, -p 4 to see if it really makes a noticable difference (maybe add time before the command, and -u 1000000 to only use a smallish subset).