cnobles / iGUIDE

Bioinformatic pipeline for identifying dsDNA breaks by marker based incorporation, such as breaks induced by designer nucleases like Cas9.
https://iguide.readthedocs.io/en/latest/
GNU General Public License v3.0
20 stars 9 forks source link

Pipeline exceeds memory specifications and fails #79

Closed ShanSabri closed 3 years ago

ShanSabri commented 3 years ago

Hi @cnobles,

I'm having issues trying to tame memory usage of the pipeline. I'm on a 128Gb (16 core) instance and I've set my memory configuration to be very lax:

# Memory Management (in MB units)
defaultMB : 6000
demultiMB : 40000
trimMB : 8000
filtMB : 4000
consolMB : 4000
alignMB : 20000
qualCtrlMB : 16000
assimilateMB : 16000
evaluateMB : 8000
reportMB : 4000

I am also specifying the pipeline with the --resources mem_mb flab but the memory usage still exceeds:

iguide run IP102/configs/IP102.yml -- --notemp --nolock --cores 14 --restart-times 3 --latency-wait 180 --resources mem_mb=110000

I've pinpointed the failing of the pipeline to be right after demux and when writing out the fq files. The files begin to write out but at the same time I see my memory usage significantly increase till it kills the job for exceeding 128Gb.

Here's a screenshot of htop usage right before the job is killed. Note using 123/124Gb and ~1.2Gb of demux fqs have been written out:

image

Is there another way to reduce the memory usage?

iditbuch commented 3 years ago

Hi ShanSabri, If your fq.gz files are of 10s of MBs you may only need to run iGuide on a machine with 256GB or even 512GB RAM. However, if your input fq.gz files are of 100s MBs, you'd probably need a 1TB RAM or even 2TB RAM machine. Another issue around this which I still could not solve, is the fact that an Integer in R is only 32bit, even if you're running on a 64bit machine. This means that there's a hard limit on the number of reads or processed reads. But again, I'd suggest to try a machine with 512GB RAM first. Idit

cnobles commented 3 years ago

Hi ShanSabri,

I ran into this issue when developing the pipeline as well. All my work was conducted on a 50 core, 200 GB memory machine. Yes, the scripts use a lot of memory if you are trying to push a large number of reads through at the same time, this can make the pipeline faster, but higher demand. The alternative is to spread out the reads across multiple cores, but with a shared memory pool, the demand is still there, unless you don't run them all at the same time.

So I put a binning system into iGUIDE, which separates the reads of a given sample into the set number of bins. Each sample-bin will be run as it's own job through the pipeline, and combined with limiting the number of active cores or increasing the number of bins to a number where their reads don't overwhelm your system, you should be able to process quite a bit. We were easily able to process MiSeq runs on the above machine.

As you can see though, I didn't given any numbers since this should be tailored to your systems. The default on the simulation config is 3 bins, with a level of 250. This means that each bin will be filled to 250 reads, and then if there would be left over reads to still process, the reads would be evenly distributed among the three bins. An example, 1000 reads with a config specifying 3 bins and a level of 250, would produce 3 bins with 333, 333, and 334 reads. Those bins would be processed through in separate jobs. If you can spread them out, then you can process fast, but if your memory restricted, then you can either process fewer at a time or make them small enough to not eat up your resources.

In case you were wondering, the bin and level parameters are in the config file under the #Binning section.

Idit has been using the software for quite some time, but I can't confirm the specs or limits suggested above in this post.

Let me know if that helps resolve your issue.

ShanSabri commented 3 years ago

Hi @iditbuch + @cnobles: I ported over to a 512Gb instance and it ran fine. I also found that if I specify to use 1/2 the cores on my 128Gb instance then it also runs okay but takes nearly double the amount of time.

Thanks for the clarification!