Open droazen opened 7 years ago
Does every ROI require this much memory or does the memory requirement fluctuate? If we are parallelizing runs, then at any given moment do all the sites being processed require this much memory? Can the memory across the threads be shared?
@sooheelee We're talking about peak memory usage here. If we can get the peak memory usage below certain thresholds, we can provision cheaper machines on the cloud for this part of the pipeline.
Memory across threads can be shared, yes, but not across separate processes.
I'd be interested in knowing the composition/characteristics of sites that peak memory use.
I think typically they are bad/repetitive regions of the genome (near the centromeres, for example) to which large numbers of reads get erroneously mapped. For HaplotypeCaller
specifically, sites with large numbers of alleles / a complicated haplotype graph might also cause memory use and/or runtime to explode.
My understanding is that production excludes calling on the majority of such sites via their intervals list. So I'd be interested in knowing what is the fraction of these high memory sites of all the sites that go through graph assembly. Also, what fraction of these may be due to alternate haplotypes as represented by the ALT contigs in GRCh38.
@sooheelee Might be a question for @yfarjoun
I'm pretty sure that we don't exclude any regions in the hg38 pipeline right now.
Yes we do, we run on a list of calling intervals that avoids empty/blackhole/timesuck regions.
Are you talking about b37 or hg38? I thought the only things missing in hg38 are where the reference is all Ns.
Hg38. Maybe you're right that it's only N regions -- I haven't actually looked.
That is correct. N's only (on the main contigs, not including Y and MT)
We looked into the slow regions and didn't find anything worth doing.
On Wed, Apr 26, 2017 at 9:46 PM, Eric Banks notifications@github.com wrote:
I'm pretty sure that we don't exclude any regions in the hg38 pipeline right now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/2591#issuecomment-297588284, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0hyn3fdw-iQ2Ea1260I2GrdxjjkEks5rz_NcgaJpZM4M5EDY .
Thanks @eitanbanks for the clarification on our calling intervals using all regions in GRCh38 excepting Ns. So the peak memory use regions may or may not correspond to regions we previously excluded for b37 using intervals. But according to @yfarjoun, the slow regions were not slow enough to exclude for GRCh38. This reminds me--I believe GRCh38 was specifically designed in part to even out high coverage pileups and soak up reads via the decoy sequences that would otherwise cause issues. So I would hypothesize that the profile of regions that are peaking memory will be different for GRCh38 than previous assemblies.
I'd be interested in confirming (i) whether peak-memory-use-regions are the same or different across samples and (ii) the distribution of the peak-memory-use-regions, e.g. 50% regions requiring 50% more memory versus 10% regions requiring 200% more memory than the mean.
@jamesemery While you're in the HaplotypeCallerEngine
doing optimizations, you should profile peak memory usage as well and see if we can get it down to < 3 GB. This would reduce costs by allowing us to use cheaper instances on the cloud.
@droazen Its worth noting that the numbers/goals for this runtime are different now that PaPI V2 is being used more frequently. Since the maximum memory per CPU is 6.5 GB on GCS for a custom machine, that is the absolute maximum memory a task can take without having to eat the cost of adding a second core. Any memory savings we can afford beyond 6.5 GB (not just <3 GB) will still result in savings on PaPI V2 and is thus worthwhile.
Relates to #4272
@jamesemery As part of this, you should check whether cromwell has implemented auto-retry with auto-memory-doubling yet. It would be much easier to prove that we need ~3GB or less in the typical case (and rely on automatic retry for pathological cases), vs. proving that that amount is sufficient even in the worst-case scenario.
Ruchi has a branch for memory retries that I haven't tried yet, but it's definitely not standard in Cromwell.
A request from @eitanbanks and @yfarjoun :
"Yossi and I are just looking at our production processing costs and the HaplotypeCaller is the biggest culprit right now. That's because it currently requires these high memory machines. If we could somehow get it to use a max of 3 GB RAM then we'd cut 10% off of the entire pipeline. Even 6GB would be okay, but 3 would be huge. What do you think -- will it be possible?"