Closed Teezi closed 2 months ago
Hi!
Yep, looks like a memory related issue. Genomes in general is pushing superFreq (or R really) in terms of resources. It's possible to get runs through, but requires more attention than exomes or RNA-Seq. Seems like you made it pretty far in your run, so just a few tweaks should allow you to get it through.
SuperFreq memory usage scales with cpus (due to a flaw in R), so easiest way to decrease memory usage is to decrease the cpu input parameter in superFreq()
. So if you go down to 6 or 4 cpus (or worst case 2 cpus), and give it 440G, then likely to go through, although a little slower.
There are regular save point in superFreq, and less stuff gets stuck RAM when results are loaded from a save-point, so sometimes it's enough to just resubmit a few times, and it'll get a couple save points further each time...
Reducing input variant from GATK is another way to go, but should be last resort as you risk removing important variants. In your case, seeing that you already made it pretty far, I think just reducing cpus and rerunning (a couple of times) should be enough.
Thanks for your repid response!!
Understood... I will use 6 CPUs and have a go.
Regarding memory, can I specify a parameter within the superFreq()
function, or is setting it only with#SBATCH --mem=440GB
(as I am using HPC).
Regarding regular save points, does this mean I don't need to delete the outputs from the R/
and plots/
directories for the failed specific sample when re-running the analysis?
The setting in SBATCH is how many cpus are allocated to your machine that you run on.
superFreq()
setting sets how many cpus superFreq will actually use, and also affects memory use.
So changing the sbatch cpu isn't going to affect how much memory superfreq is using. You want to match the two to optimise use of your hpc resources.
Oh, sorry, memory... No, there is no memory setting in superFreq. It just goes and assumes there is enough...
And yes, the save points are in R directory, so keep those to avoid rerunning from the start.
I see and will give it a shot. Thanks a lot !!
Good luck, let me know how it goes so I can close the issue.
Good luck, let me know how it goes so I can close the issue.
Sure!
Hi Chris,
I was ananlysing whole-genome sequencing data using
superFreq
with 10 CPUs and 200G of Memory. And I usedmutect2
raw outputs as the somatic inputs,haplotypeCaller
raw outputs as the germline inputs,Some of my samples work very well. But others keep crashing with errors:
I suspected it was a memory-related issue. So I did the following:
What are your thoughts on this error? Could it be a memory issue? Do you think using the filtered outputs from MuTect2 and HaplotypeCaller would be beneficial?
Any insights will be much appreciated !!