FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
366 stars 101 forks source link

Is are ways to decrease the memory requirement of coverage2cytosine at the expense of computation time? #650

Open onurcanbektas opened 5 months ago

onurcanbektas commented 5 months ago

Hi,

We do 10x deeper scNMTseq than that are used in typical scNMTseq experiments. However, during the coverage2cytosine portion of the pipeline, for each cell, I need at least 400GB RAM, otherwise the job fails due to not having enough memory. We have few the servers with this many RAM, but since we receive data from hundreds of cells, takes weeks to process all of the cells, one-by-one. But the process of each cells takes about 5 hours.

I was wondering, whether there is a way to trade the memory requirements with computational time. For example, if for each cell, the process took 1 day but required 100GB RAM, because we have many servers with at least 100GB ram, I could process all cells at once.

I use the following parameters for coverage2cytosine --nome-seq --gc

FelixKrueger commented 4 months ago

wow that sounds like a huge amount of RAM. I don't think I have every heard about such excessive amounts... In theory, coverage2cytosine should hold the genome in memory (typically some 3-4GB for the human or mouse genome), and then all positions that were covered per chromosome. Since this operation should be chromosome-by-chromosome you should never really see the memory requirements to go all that high... (also 5h seems a bit on the slow side....)

Is there a way for you to monitor the memory consumption in some more detail (as in: does it keep creeping up constantly over time?). We just quickly looked for an answer and found the PIDSTAT tool might be able to do this (with -r for memory, possibly combined with --interval?). Alternatively, could you provide me with a sample coverage file and the genome you used for this so I can try out some things myself?

onurcanbektas commented 4 months ago

Dear Felix, thanks a lot for the promptly reply. I sent you an email with a sample data and the genome.