Open Tintest opened 6 years ago
@Tintest Sorry for the slow reply. I'm not sure exactly what the issue is. I've never seen this exact error before.
I have two guesses, one is that there's something really weird going on in spark that's causing that null pointer exception which is killing the heartbeat. I'm not sure how to debug that without your access to your input data
The other theory which I think is more likely, is that you're running out of memory and it's causing weird errors to occur.
How much memory is available to your job? You can set the java -Xmx value with --java-options "-Xmx120g"
as a GATK option. I would check that you're not running out of memory on your machine, or giving the job too little. I think for BaseRecalibratorSpark you want at least 2-4g per core, but haven't tested it in a long time so I might be wrong about that.
Thank you @lbergelson for your answer.
The memory I'm using is specified by nextflow, but I could also force it to Java.
My machine got 64 go of memory and 24 cores. 2go by cores is ok but more could be problematic. Maybe I will try on a more powerful one and I will tell you how is it going.
Does the Spark strategy really needs more memory ? I've seen memory usage peaks around 60 go before crash with the spark tool and around 25go with the "classic" version (which complete the run without any issue).
Thank you.
I got the same problem during init the reference The total memory is 991428608???
org.broadinstitute.hellbender.tools.spark.BaseRecalibratorSpark done. Elapsed time: 0.13 minutes.
Runtime.totalMemory()=991428608
18/09/15 17:21:28 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.broadinstitute.hellbender.relocated.com.google.common.io.ByteStreams.copy(ByteStreams.java:74)
at org.broadinstitute.hellbender.relocated.com.google.common.io.ByteStreams.toByteArray(ByteStreams.java:115)
at org.broadinstitute.hellbender.engine.spark.datasources.ReferenceTwoBitSource.<init>(ReferenceTwoBitSource.java:40)
at org.broadinstitute.hellbender.engine.datasources.ReferenceMultiSource.<init>(ReferenceMultiSource.java:42)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReference(GATKSparkTool.java:500)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeToolInputs(GATKSparkTool.java:468)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:459)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:188)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:207)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
18/09/15 17:21:28 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.OutOfMemoryError: Java heap space)
18/09/15 17:21:28 INFO util.ShutdownHookManager: Shutdown hook called
18/09/15 17:21:28 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-test/nm-local-dir/usercache/test/appcache/application_1537046459822_0001/spark-d0ea2344-815c-4bbf-a11b-35031a9c4940
after increase both the driver and executor memory to 12G, the problem go way.
Hello, i'm trying to run GATK4 BaseRecalibratorSpark in local in a nextflow pipeline, but I got every time the same error with this command :
gatk-launch BaseRecalibratorSpark --spark-runner LOCAL --spark-master local[23] -R $indexbit -I ${input_bam} --known-sites ${params.dbsnpAll} -L ${params.targetBed} -O ${output_name}-recalibration_report.grp
Here is the full error output, basically, the failure seems to be linked to an absence of activity of the BaseRecalibratorSpark during 12s which kill the spark heartbeat and then the processus :
I got the same one when I try to execute it outside of nextflow. I also tried to run it with --conf spark.executor.heartbeatInterval=120, but it seems it is useless, i'm not sure it is the good syntax for a local execution of spark.
Here is attached, the complete log file. command.txt
Can you help me ?
Also posted on the GATK forum
Best regards.