Closed vymao closed 4 years ago
Why are you running VariantRecalibrator on multiple files? In the current implementation the tool does read all the variants into memory, so merging the files somehow before would dramatically reduce the memory requirements.
I believe your issue is that you are assigning 600GB to execution of cromwell, but the error is with the call to VariantRecalibrator in one of the tasks not having enough memory. A few tasks call VariantRecalibrator, do you know which task failed? Can you post the java call from the STDERR file? For me, it was task SNPsVariantRecalibrator which was assigned only 3.5GB of memory by default.
In joint-discovery-gatk4.wdl, the memory assigned for each task can be set via "machine_mem_gb", but it looks like the current input.json does not have that variable, but instead "mem_size" for each task.
A simple solution would be to replace ${java_mem} with a static value in calls to VariantRecalibrator (lines 564 & 684). For example, replace:
${gatk_path} --java-options "-Xmx${java_mem}g -Xms${java_mem}g"
with
${gatk_path} --java-options "-Xmx100g -Xms100g"
I'm not certain this will help, but I think it's a step in the right direction.
Bug Report
I was running the JointDiscovery pipeline as a part of the GATK Best Practices pipeline. I am running this on many vcf files (~150) called by the HaplotypeCaller. I am getting this error:
I believe this is derived from an error earlier in the log, since the
stderr
gives the same Java heap space error:I have read past issues (https://gatkforums.broadinstitute.org/gatk/discussion/23880/java-heap-space) regarding this that may suggest it is a bug. It has pointed me to increasing the available heap memory through the primary command of -Xmx. Is this the way to do it?
where I substitute in the corresponding config, json, and wdl files.
Is 600G enough? Each vcf is around 6G large and since I have 150, does that mean I should be allocating more than 900G (6G x 150)?