RAM usage on targeted reads

OSU-SRLab / MANTIS

Microsatellite Analysis for Normal-Tumor InStability

GNU General Public License v3.0

69 stars 27 forks source link

RAM usage on targeted reads #24

Closed IvantheDugtrio closed 6 years ago

IvantheDugtrio commented 6 years ago

I noticed my processes would start failing after it finished running kmer_repeat_counter.py for several hours because mantis.py would consume all of the available system memory plus the entire swapfile. I ran this on a server with 88 processing threads and 192GB of RAM. I ran with the --threads 88 option.

I am analyzing targeted sequencing reads covering about 0.9 Mb, all CDS. I am also using a pool of normals as my normal control since we don't have the normal tissue for these samples.

Also I generated my loci.bed file from the GRCh37.fa reference genome.

How much RAM does mantis.py need per thread?

rbonneville commented 6 years ago

88 threads is excessive for MANTIS; Amdahl's law will destroy any performance gains at this point. We recommend starting with 3 threads.
In our tests during development of MANTIS (see the MANTIS publication, table S2), we found that each thread uses about 10 MB additional memory, on top of 79 MB base. However, this would be dependent on the number of microsatellite loci assessed.
The RepeatFinder tool generates a list of potential microsatellite regions across the entire genome. We recommend intersecting your loci.bed file with your sequencing panel.
As the MANTIS algorithm compares reads at each microsatellite locus between two samples (a tumor and normal), it fundamentally requires matched tissue. Because microsatellite lengths differ naturally among different people (accounting for their use in forensics), mismatched tumor and normal samples can produce false positives. This has been observed in our own testing.

IvantheDugtrio commented 6 years ago

I reran the samples with an intersected bed file for my panel and with 3 threads used. So far it's running much faster and has no such memory bottlenecks.

Also is there any notable difference between running this with Python3 versus Python2.7? Our compute cluster runs CentOS 7 which has been tricky to get Python3 up and running on.

rbonneville commented 6 years ago

There should be no notable difference between Python 2.7 vs Python 3. However, I must reiterate that comparing mismatched tumor and normal samples is very likely to produce a high rate of false-positive MSI calls.

rbonneville commented 6 years ago

Hello @IvantheDugtrio, do you have any further questions regarding this issue?

IvantheDugtrio commented 6 years ago

Hi @rbonneville, nah this seems to have been worked out. I'm closing this now.