Open lbergelson opened 7 years ago
@tomwhite When you get a chance, could you have a look at this?
Note that there is a disabled test in DataprocIntegrationTest
that should be re-enabled once this is fixed. (The test was introduced in https://github.com/broadinstitute/gatk/pull/3767).
@tomwhite Is this still an issue? If not, can we re-enable the disabled test in DataprocIntegrationTest
and close this?
Running:
/home/travis/gcloud/google-cloud-sdk/bin/gcloud dataproc jobs submit spark --cluster gatk-test-2495f43b-04fc-49e7-aa0a-7108cc876246 --properties spark.driver.userClassPathFirst=false,spark.io.compression.codec=lzf,spark.driver.maxResultSize=0,spark.executor.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 ,spark.driver.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 ,spark.kryoserializer.buffer.max=512m,spark.yarn.executor.memoryOverhead=600 --jar gs://hellbender-test-logs/staging/gatk-package-4.1.0.0-24-g18a95c7-SNAPSHOT-spark_3e9078b7e67707952fa12a0c5c4d2b71.jar -- PrintVariantsSpark --V gs://hellbender/test/resources/large/gvcfs/gatk3.7_30_ga4f720357.24_sample.21.expected.vcf --output gs://hellbender-test-logs/staging/12dc38b0-0b40-49d5-a98e-fe83ca658003.vcf --spark-master yarn
Job [654b5b8e01de4c60bd87d941d4ec8831] submitted.
Waiting for job output...
19/02/18 16:58:03 WARN org.apache.spark.SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
16:58:09.526 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly
16:58:09.705 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/tmp/654b5b8e01de4c60bd87d941d4ec8831/gatk-package-4.1.0.0-24-g18a95c7-SNAPSHOT-spark_3e9078b7e67707952fa12a0c5c4d2b71.jar!/com/intel/gkl/native/libgkl_compression.so
16:58:10.112 INFO PrintVariantsSpark - ------------------------------------------------------------
16:58:10.113 INFO PrintVariantsSpark - The Genome Analysis Toolkit (GATK) v4.1.0.0-24-g18a95c7-SNAPSHOT
16:58:10.113 INFO PrintVariantsSpark - For support and documentation go to https://software.broadinstitute.org/gatk/
16:58:10.113 INFO PrintVariantsSpark - Executing as root@gatk-test-2495f43b-04fc-49e7-aa0a-7108cc876246-m on Linux v4.9.0-8-amd64 amd64
16:58:10.114 INFO PrintVariantsSpark - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_181-8u181-b13-2~deb9u1-b13
16:58:10.114 INFO PrintVariantsSpark - Start Date/Time: February 18, 2019 4:58:09 PM UTC
16:58:10.114 INFO PrintVariantsSpark - ------------------------------------------------------------
16:58:10.114 INFO PrintVariantsSpark - ------------------------------------------------------------
16:58:10.115 INFO PrintVariantsSpark - HTSJDK Version: 2.18.2
16:58:10.115 INFO PrintVariantsSpark - Picard Version: 2.18.25
16:58:10.115 INFO PrintVariantsSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:58:10.116 INFO PrintVariantsSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:58:10.116 INFO PrintVariantsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
16:58:10.116 INFO PrintVariantsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:58:10.116 INFO PrintVariantsSpark - Deflater: IntelDeflater
16:58:10.116 INFO PrintVariantsSpark - Inflater: IntelInflater
16:58:10.116 INFO PrintVariantsSpark - GCS max retries/reopens: 20
16:58:10.116 INFO PrintVariantsSpark - Requester pays: disabled
16:58:10.116 WARN PrintVariantsSpark -
?[1m?[31m !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: PrintVariantsSpark is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!?[0m
16:58:10.116 INFO PrintVariantsSpark - Initializing engine
16:58:10.116 INFO PrintVariantsSpark - Done initializing engine
19/02/18 16:58:10 WARN org.apache.spark.SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
19/02/18 16:58:10 INFO org.spark_project.jetty.util.log: Logging initialized @8431ms
19/02/18 16:58:11 INFO org.spark_project.jetty.server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
19/02/18 16:58:11 INFO org.spark_project.jetty.server.Server: Started @8536ms
19/02/18 16:58:11 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@45c90a05{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/02/18 16:58:11 WARN org.apache.spark.scheduler.FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
19/02/18 16:58:12 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at gatk-test-2495f43b-04fc-49e7-aa0a-7108cc876246-m/10.240.0.11:8032
19/02/18 16:58:13 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at gatk-test-2495f43b-04fc-49e7-aa0a-7108cc876246-m/10.240.0.11:10200
19/02/18 16:58:15 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1550508751046_0004
WARNING 2019-02-18 16:58:23 AsciiLineReader Creating an indexable source for an AsciiFeatureCodec using a stream that is neither a PositionalBufferedStream nor a BlockCompressedInputStream
WARNING 2019-02-18 16:58:23 AsciiLineReader Creating an indexable source for an AsciiFeatureCodec using a stream that is neither a PositionalBufferedStream nor a BlockCompressedInputStream
19/02/18 16:58:25 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input files to process : 1
19/02/18 16:58:29 ERROR org.apache.spark.scheduler.TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
genotypes (htsjdk.variant.variantcontext.VariantContext)
interval (org.broadinstitute.hellbender.engine.spark.SparkSharder$PartitionLocatable)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:396)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:307)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:362)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:88)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:72)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
at htsjdk.variant.variantcontext.LazyGenotypesContext.invalidateSampleOrdering(LazyGenotypesContext.java:205)
at htsjdk.variant.variantcontext.GenotypesContext.add(GenotypesContext.java:353)
at htsjdk.variant.variantcontext.GenotypesContext.add(GenotypesContext.java:46)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
... 18 more
19/02/18 16:58:29 INFO org.spark_project.jetty.server.AbstractConnector: Stopped Spark@45c90a05{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
16:58:29.970 INFO PrintVariantsSpark - Shutting down engine
[February 18, 2019 4:58:29 PM UTC] org.broadinstitute.hellbender.tools.spark.pipelines.PrintVariantsSpark done. Elapsed time: 0.34 minutes.
Runtime.totalMemory()=1106771968
org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
genotypes (htsjdk.variant.variantcontext.VariantContext)
interval (org.broadinstitute.hellbender.engine.spark.SparkSharder$PartitionLocatable)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:361)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at org.broadinstitute.hellbender.engine.spark.SparkSharder.computePartitionReadExtents(SparkSharder.java:388)
at org.broadinstitute.hellbender.engine.spark.SparkSharder.joinOverlapping(SparkSharder.java:212)
at org.broadinstitute.hellbender.engine.spark.SparkSharder.joinOverlapping(SparkSharder.java:205)
at org.broadinstitute.hellbender.engine.spark.SparkSharder.joinOverlapping(SparkSharder.java:153)
at org.broadinstitute.hellbender.engine.spark.SparkSharder.shard(SparkSharder.java:108)
at org.broadinstitute.hellbender.engine.spark.VariantWalkerSpark.getVariants(VariantWalkerSpark.java:127)
at org.broadinstitute.hellbender.engine.spark.VariantWalkerSpark.runTool(VariantWalkerSpark.java:154)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528)
at org.broadinstitute.hellbender.engine.spark.VariantWalkerSpark.runPipeline(VariantWalkerSpark.java:56)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [654b5b8e01de4c60bd87d941d4ec8831] entered state [ERROR] while waiting for [DONE].
I reproduced this on a cluster. I tried the following change, but then hit https://github.com/EsotericSoftware/kryo/issues/489
diff --git a/src/main/java/org/broadinstitute/hellbender/engine/spark/GATKRegistrator.java b/src/main/java/org/broadinstitute/hellbender/engine/spark/GATKRegistrator.java
index bff76a367..cabdc7332 100644
--- a/src/main/java/org/broadinstitute/hellbender/engine/spark/GATKRegistrator.java
+++ b/src/main/java/org/broadinstitute/hellbender/engine/spark/GATKRegistrator.java
@@ -2,10 +2,12 @@ package org.broadinstitute.hellbender.engine.spark;
import com.esotericsoftware.kryo.Kryo;
import com.esotericsoftware.kryo.serializers.FieldSerializer;
+import com.esotericsoftware.kryo.serializers.JavaSerializer;
import com.google.common.collect.ImmutableMap;
import de.javakaffee.kryoserializers.UnmodifiableCollectionsSerializer;
import de.javakaffee.kryoserializers.guava.ImmutableMapSerializer;
import htsjdk.samtools.*;
+import htsjdk.variant.variantcontext.LazyGenotypesContext;
import org.apache.spark.serializer.KryoRegistrator;
import org.bdgenomics.adam.serialization.ADAMKryoRegistrator;
import org.bdgenomics.adam.serialization.ADAMKryoRegistrator;
import org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils;
@@ -92,5 +94,8 @@ public class GATKRegistrator implements KryoRegistrator {
kryo.register(MarkDuplicatesSparkUtils.IndexPair.class, new FieldSerializer(kryo, MarkDuplicatesSparkUtils.IndexPair.class));
kryo.register(ReadsKey.class, new FieldSerializer(kryo, ReadsKey.class));
kryo.register(ReadsKey.KeyForFragment.class, new FieldSerializer(kryo, ReadsKey.KeyForFragment.class));
- kryo.register(ReadsKey.KeyForPair.class, new FieldSerializer(kryo, ReadsKey.KeyForPair.class)); }
+ kryo.register(ReadsKey.KeyForPair.class, new FieldSerializer(kryo, ReadsKey.KeyForPair.class));
+ // Use JavaSerializer for LazyGenotypesContext to handle transient fields correctly
+ kryo.register(LazyGenotypesContext.class, new JavaSerializer());
+ }
}
The workaround suggested in the issue referenced above of creating a local copy of JavaSerializer
works.
It also looks like https://github.com/EsotericSoftware/kryo/commit/19a6b5edee7125fbaf54c64084a8d0e13509920b may fix the problem, but I haven't tried it, and isn't available until Spark 2.4.0 (I think).
PrintVariantsSpark crashes on dataproc with serialization issues.
Ex: