broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 590 forks source link

SortSamSpark Required array length is too large #8949

Open fo40225 opened 3 months ago

fo40225 commented 3 months ago

Bug Report

Affected tool(s) or class(es)

Tool/class name(s), special parameters?

SortSamSpark --sort-order coordinate

Affected version(s)

4.4.0.0

Description

Describe the problem below. Provide screenshots , stacktrace , logs where appropriate.

An error occurs when using SortSamSpark to sort the large BAM file that contain long reads only (90x human wgs, min. read length>10kbp). However, if the large BAM file contains short reads, it executes normally.

Steps to reproduce

Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem. (The support team may follow up to ask you to upload data to reproduce the issue.)

sysctl -w vm.max_map_count=2147483642
gatk SortSamSpark \
 --input HG002-NA24385-GM24385.bam \
 --output HG002-NA24385-GM24385.sorted.bam \
 --sort-order coordinate \
 --java-options "-XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=96 -XX:+UseNUMA -XX:+UseZGC -Xmx1794G" \
 --tmp-dir . \
 -- \
 --spark-runner LOCAL --spark-master local[96] --conf spark.local.dir=./tmp --conf spark.port.maxRetries=61495

Expected behavior

Tell us what should happen

Output a sorted BAM file.

Actual behavior

Tell us what happens instead

java.lang.OutOfMemoryError: Required array length ? is too large

The last lines of the log file.

11:00:42.884 INFO  BlockManagerInfo - Removed taskresult_15758 on 172.20.19.130:43279 in memory (size: 10.5 MiB, free: 1076.2 GiB)
11:00:42.888 INFO  TaskSchedulerImpl - Removed TaskSet 0.0, whose tasks have all completed, from pool
11:00:42.902 INFO  DAGScheduler - ResultStage 0 (sortByKey at SparkUtils.java:165) finished in 1652.742 s
11:00:42.915 INFO  DAGScheduler - Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
11:00:42.916 INFO  TaskSchedulerImpl - Killing all running tasks in stage 0: Stage finished
11:00:42.927 INFO  DAGScheduler - Job 0 finished: sortByKey at SparkUtils.java:165, took 1653.133440 s
11:00:49.975 INFO  MemoryStore - Block broadcast_3 stored as values in memory (estimated size 2044.7 KiB, free 1076.2 GiB)
11:00:49.999 INFO  MemoryStore - Block broadcast_3_piece0 stored as bytes in memory (estimated size 56.6 KiB, free 1076.2 GiB)
11:00:49.999 INFO  BlockManagerInfo - Added broadcast_3_piece0 in memory on 172.20.19.130:43279 (size: 56.6 KiB, free: 1076.2 GiB)
11:00:50.000 INFO  SparkContext - Created broadcast 3 from broadcast at ReadsSparkSink.java:146
11:00:50.033 INFO  MemoryStore - Block broadcast_4 stored as values in memory (estimated size 2.1 MiB, free 1076.2 GiB)
11:00:50.045 INFO  MemoryStore - Block broadcast_4_piece0 stored as bytes in memory (estimated size 56.6 KiB, free 1076.2 GiB)
11:00:50.045 INFO  BlockManagerInfo - Added broadcast_4_piece0 in memory on 172.20.19.130:43279 (size: 56.6 KiB, free: 1076.2 GiB)
11:00:50.045 INFO  SparkContext - Created broadcast 4 from broadcast at BamSink.java:76
11:00:50.120 INFO  FileOutputCommitter - File Output Committer Algorithm version is 1
11:00:50.120 INFO  FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
11:00:50.177 INFO  SparkContext - Starting job: runJob at SparkHadoopWriter.scala:83
11:00:50.278 INFO  DAGScheduler - Registering RDD 14 (mapToPair at SparkUtils.java:161) as input to shuffle 0
11:00:50.291 INFO  DAGScheduler - Got job 1 (runJob at SparkHadoopWriter.scala:83) with 44262 output partitions
11:00:50.291 INFO  DAGScheduler - Final stage: ResultStage 2 (runJob at SparkHadoopWriter.scala:83)
11:00:50.291 INFO  DAGScheduler - Parents of final stage: List(ShuffleMapStage 1)
11:00:50.296 INFO  DAGScheduler - Missing parents: List(ShuffleMapStage 1)
11:00:50.300 INFO  DAGScheduler - Submitting ShuffleMapStage 1 (MapPartitionsRDD[14] at mapToPair at SparkUtils.java:161), which has no missing parents
11:00:53.974 INFO  TaskSchedulerImpl - Cancelling stage 1
11:00:53.974 INFO  TaskSchedulerImpl - Killing all running tasks in stage 1: Stage cancelled
11:00:53.975 INFO  DAGScheduler - ShuffleMapStage 1 (mapToPair at SparkUtils.java:161) failed in 3.609 s due to Job aborted due to stage failure: Task serialization failed: java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
        at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
        at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
        at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
        at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
        at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1862)
        at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:714)
        at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:160)
        at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
        at com.esotericsoftware.kryo.io.Output.close(Output.java:196)
        at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:283)
        at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:165)
        at org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:264)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1470)
        at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1070)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1516)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1501)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331)
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

11:00:53.977 INFO  DAGScheduler - Job 1 failed: runJob at SparkHadoopWriter.scala:83, took 3.799268 s
11:00:53.979 ERROR SparkHadoopWriter - Aborting job job_202408111100502620487673658411251_0021.
org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
        at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
        at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
        at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
        at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
        at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1862)
        at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:714)
        at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:160)
        at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
        at com.esotericsoftware.kryo.io.Output.close(Output.java:196)
        at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:283)
        at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:165)
        at org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:264)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1470)
        at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1070)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1516)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1501)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331)
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1523) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.collection.immutable.List.foreach(List.scala:431) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2281) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:83) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopDataset$1(PairRDDFunctions.scala:1078) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1076) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopFile$2(PairRDDFunctions.scala:995) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:986) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:825) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.disq_bio.disq.impl.formats.bam.BamSink.save(BamSink.java:93) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.disq_bio.disq.HtsjdkReadsRddStorage.write(HtsjdkReadsRddStorage.java:233) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:155) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:119) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:374) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.tools.spark.pipelines.SortSamSpark.runTool(SortSamSpark.java:114) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:546) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.broadinstitute.hellbender.Main.main(Main.java:289) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
Caused by: java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
        at jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649) ~[?:?]
        at jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642) ~[?:?]
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100) ~[?:?]
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130) ~[?:?]
        at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1862) ~[?:?]
        at java.io.ObjectOutputStream.write(ObjectOutputStream.java:714) ~[?:?]
        at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:160) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at com.esotericsoftware.kryo.io.Output.flush(Output.java:185) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at com.esotericsoftware.kryo.io.Output.close(Output.java:196) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:283) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:165) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:264) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1470) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
        at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1070) ~[?:?]
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1516) ~[?:?]
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438) ~[?:?]
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181) ~[?:?]
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572) ~[?:?]
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529) ~[?:?]
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438) ~[?:?]
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181) ~[?:?]
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572) ~[?:?]
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529) ~[?:?]
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438) ~[?:?]
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181) ~[?:?]
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350) ~[?:?]
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1501) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at scala.collection.immutable.List.foreach(List.scala:431) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ~[gatk-package-4.4.0.0-local.jar:4.4.0.0]
11:00:54.078 INFO  AbstractConnector - Stopped Spark@2f829853{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
11:00:54.091 INFO  SparkUI - Stopped Spark web UI at http://172.20.19.130:4040
11:00:54.122 INFO  MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
11:00:54.175 INFO  MemoryStore - MemoryStore cleared
11:00:54.175 INFO  BlockManager - BlockManager stopped
11:00:54.193 INFO  BlockManagerMaster - BlockManagerMaster stopped
11:00:54.211 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
11:00:54.302 INFO  SparkContext - Successfully stopped SparkContext
11:00:54.303 INFO  SortSamSpark - Shutting down engine
[August 11, 2024 at 11:00:54 AM CST] org.broadinstitute.hellbender.tools.spark.pipelines.SortSamSpark done. Elapsed time: 27.81 minutes.
Runtime.totalMemory()=1926292832256
org.apache.spark.SparkException: Job aborted.
        at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:106)
        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopDataset$1(PairRDDFunctions.scala:1078)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1076)
        at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopFile$2(PairRDDFunctions.scala:995)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:986)
        at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:825)
        at org.disq_bio.disq.impl.formats.bam.BamSink.save(BamSink.java:93)
        at org.disq_bio.disq.HtsjdkReadsRddStorage.write(HtsjdkReadsRddStorage.java:233)
        at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:155)
        at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:119)
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:374)
        at org.broadinstitute.hellbender.tools.spark.pipelines.SortSamSpark.runTool(SortSamSpark.java:114)
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:546)
        at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
        at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
        at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
        at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
        at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
        at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1862)
        at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:714)
        at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:160)
        at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
        at com.esotericsoftware.kryo.io.Output.close(Output.java:196)
        at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:283)
        at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:165)
        at org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:264)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1470)
        at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1070)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1516)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1501)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331)
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1523)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331)
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2281)
        at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:83)
        ... 27 more
Caused by: java.lang.OutOfMemoryError: Required array length 2147483639 + 798 is too large
        at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
        at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
        at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
        at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
        at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1862)
        at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:714)
        at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:160)
        at com.esotericsoftware.kryo.io.Output.flush(Output.java:185)
        at com.esotericsoftware.kryo.io.Output.close(Output.java:196)
        at org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:283)
        at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:165)
        at org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:264)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1470)
        at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1070)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1516)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1572)
        at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1529)
        at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1438)
        at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
        at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1501)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1329)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1332)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1331)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1331)
        at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2810)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
11:00:54.334 INFO  ShutdownHookManager - Shutdown hook called
11:00:54.335 INFO  ShutdownHookManager - Deleting directory /raid/tmp/d6/c66ba827e22dbc38625af1cbc85adc/tmp/spark-f9c7c336-4e98-4fcc-855b-ba8a5a29e074

The first lines of the log file:

vm.max_map_count = 2147483642
Using GATK jar /Public/Everythings/misc/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=96 -XX:+UseNUMA -XX:+UseZGC -Xmx1794G -jar /Public/Everythings/misc/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar SortSamSpark --input HG002-NA24385-GM24385.bam --output HG002-NA24385-GM24385.sorted.bam --sort-order coordinate --tmp-dir . --spark-master local[96] --conf spark.local.dir=./tmp --conf spark.port.maxRetries=61495
Picked up JAVA_TOOL_OPTIONS: -XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=96 -XX:+UseNUMA -XX:+UseZGC
10:33:05.822 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Public/Everythings/misc/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:33:05.859 INFO  SortSamSpark - ------------------------------------------------------------
10:33:05.862 INFO  SortSamSpark - The Genome Analysis Toolkit (GATK) v4.4.0.0
10:33:05.862 INFO  SortSamSpark - For support and documentation go to https://software.broadinstitute.org/gatk/
10:33:05.862 INFO  SortSamSpark - Executing as root@gs2040t on Linux v5.15.0-91-generic amd64
10:33:05.862 INFO  SortSamSpark - Java runtime: OpenJDK 64-Bit Server VM v17.0.9+8-LTS
10:33:05.862 INFO  SortSamSpark - Start Date/Time: August 11, 2024 at 10:33:05 AM CST
10:33:05.862 INFO  SortSamSpark - ------------------------------------------------------------
10:33:05.862 INFO  SortSamSpark - ------------------------------------------------------------
10:33:05.863 INFO  SortSamSpark - HTSJDK Version: 3.0.5
10:33:05.864 INFO  SortSamSpark - Picard Version: 3.0.0
10:33:05.864 INFO  SortSamSpark - Built for Spark Version: 3.3.1
10:33:05.864 INFO  SortSamSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:33:05.864 INFO  SortSamSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:33:05.864 INFO  SortSamSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:33:05.864 INFO  SortSamSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:33:05.864 INFO  SortSamSpark - Deflater: IntelDeflater
10:33:05.865 INFO  SortSamSpark - Inflater: IntelInflater
10:33:05.865 INFO  SortSamSpark - GCS max retries/reopens: 20
10:33:05.865 INFO  SortSamSpark - Requester pays: disabled
10:33:05.865 WARN  SortSamSpark -

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: SortSamSpark is a BETA tool and is not yet ready for use in production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

10:33:05.865 INFO  SortSamSpark - Initializing engine
10:33:05.865 INFO  SortSamSpark - Done initializing engine
10:33:06.134 WARN  Utils - Your hostname, gs2040t resolves to a loopback address: 127.0.1.1; using 172.20.19.130 instead (on interface bond0)
10:33:06.134 WARN  Utils - Set SPARK_LOCAL_IP if you need to bind to another address
10:33:06.242 INFO  SparkContext - Running Spark version 3.3.0
10:33:06.403 WARN  SparkConf - Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
10:33:06.427 INFO  ResourceUtils - ==============================================================
10:33:06.427 INFO  ResourceUtils - No custom resources configured for spark.driver.
10:33:06.428 INFO  ResourceUtils - ==============================================================
10:33:06.428 INFO  SparkContext - Submitted application: SortSamSpark
10:33:06.446 INFO  ResourceProfile - Default ResourceProfile created, executor resources: Map(memoryOverhead -> name: memoryOverhead, amount: 600, script: , vendor: , cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
10:33:06.454 INFO  ResourceProfile - Limiting resource is cpu
10:33:06.455 INFO  ResourceProfileManager - Added ResourceProfile id: 0
10:33:06.500 INFO  SecurityManager - Changing view acls to: root
10:33:06.501 INFO  SecurityManager - Changing modify acls to: root
10:33:06.501 INFO  SecurityManager - Changing view acls groups to:
10:33:06.502 INFO  SecurityManager - Changing modify acls groups to:
10:33:06.502 INFO  SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
10:33:06.755 INFO  Utils - Successfully started service 'sparkDriver' on port 34861.
10:33:06.784 INFO  SparkEnv - Registering MapOutputTracker
10:33:06.815 INFO  SparkEnv - Registering BlockManagerMaster
10:33:06.827 INFO  BlockManagerMasterEndpoint - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
10:33:06.828 INFO  BlockManagerMasterEndpoint - BlockManagerMasterEndpoint up
10:33:06.831 INFO  SparkEnv - Registering BlockManagerMasterHeartbeat
10:33:06.846 INFO  DiskBlockManager - Created local directory at /raid/tmp/d6/c66ba827e22dbc38625af1cbc85adc/tmp/blockmgr-8dc41ac8-6cf4-4424-9b15-7e2cbfc9e538
10:33:06.872 INFO  MemoryStore - MemoryStore started with capacity 1076.2 GiB
10:33:06.886 INFO  SparkEnv - Registering OutputCommitCoordinator
10:33:06.916 INFO  log - Logging initialized @3948ms to org.sparkproject.jetty.util.log.Slf4jLog
10:33:06.992 INFO  Server - jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 17.0.9+8-LTS
10:33:07.009 INFO  Server - Started @4042ms
10:33:07.080 INFO  AbstractConnector - Started ServerConnector@2f829853{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
10:33:07.081 INFO  Utils - Successfully started service 'SparkUI' on port 4040.
10:33:07.116 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@7074da1d{/,null,AVAILABLE,@Spark}
10:33:07.182 INFO  Executor - Starting executor ID driver on host 172.20.19.130
10:33:07.189 INFO  Executor - Starting executor with user classpath (userClassPathFirst = false): ''
10:33:07.208 INFO  Utils - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43279.
10:33:07.208 INFO  NettyBlockTransferService - Server created on 172.20.19.130:43279
10:33:07.210 INFO  BlockManager - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
10:33:07.214 INFO  BlockManagerMaster - Registering BlockManager BlockManagerId(driver, 172.20.19.130, 43279, None)
10:33:07.221 INFO  BlockManagerMasterEndpoint - Registering block manager 172.20.19.130:43279 with 1076.2 GiB RAM, BlockManagerId(driver, 172.20.19.130, 43279, None)
10:33:07.225 INFO  BlockManagerMaster - Registered BlockManager BlockManagerId(driver, 172.20.19.130, 43279, None)
10:33:07.226 INFO  BlockManager - Initialized BlockManager: BlockManagerId(driver, 172.20.19.130, 43279, None)
10:33:07.345 INFO  ContextHandler - Stopped o.s.j.s.ServletContextHandler@7074da1d{/,null,STOPPED,@Spark}
10:33:07.347 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@6556471b{/jobs,null,AVAILABLE,@Spark}
10:33:07.349 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@7cdb05aa{/jobs/json,null,AVAILABLE,@Spark}
10:33:07.351 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@5cb76070{/jobs/job,null,AVAILABLE,@Spark}
10:33:07.352 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@443ac5b8{/jobs/job/json,null,AVAILABLE,@Spark}
10:33:07.354 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@753e4eb5{/stages,null,AVAILABLE,@Spark}
10:33:07.355 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@63318b56{/stages/json,null,AVAILABLE,@Spark}
10:33:07.357 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@462f8fe9{/stages/stage,null,AVAILABLE,@Spark}
10:33:07.358 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@b2e1df3{/stages/stage/json,null,AVAILABLE,@Spark}
10:33:07.359 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@6cf3b3d7{/stages/pool,null,AVAILABLE,@Spark}
10:33:07.360 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@55c20a91{/stages/pool/json,null,AVAILABLE,@Spark}
10:33:07.361 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@3ba96967{/storage,null,AVAILABLE,@Spark}
10:33:07.362 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@1237cade{/storage/json,null,AVAILABLE,@Spark}
10:33:07.363 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@4509b7{/storage/rdd,null,AVAILABLE,@Spark}
10:33:07.364 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@5dbc4598{/storage/rdd/json,null,AVAILABLE,@Spark}
10:33:07.365 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@38a27ace{/environment,null,AVAILABLE,@Spark}
10:33:07.366 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@7e8783b0{/environment/json,null,AVAILABLE,@Spark}
10:33:07.367 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@53d2f0ec{/executors,null,AVAILABLE,@Spark}
10:33:07.369 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@14d36bb2{/executors/json,null,AVAILABLE,@Spark}
10:33:07.370 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@4452e13c{/executors/threadDump,null,AVAILABLE,@Spark}
10:33:07.371 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@42172065{/executors/threadDump/json,null,AVAILABLE,@Spark}
10:33:07.380 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@8e77c5b{/static,null,AVAILABLE,@Spark}
10:33:07.380 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@49741274{/,null,AVAILABLE,@Spark}
10:33:07.382 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@3e5b2630{/api,null,AVAILABLE,@Spark}
10:33:07.383 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@1b6e4761{/jobs/job/kill,null,AVAILABLE,@Spark}
10:33:07.384 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@642ec6{/stages/stage/kill,null,AVAILABLE,@Spark}
10:33:07.389 INFO  ContextHandler - Started o.s.j.s.ServletContextHandler@3fe5ad73{/metrics/json,null,AVAILABLE,@Spark}
10:33:07.397 INFO  SortSamSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
10:33:07.450 INFO  GoogleHadoopFileSystemBase - GHFS version: 1.9.4-hadoop3
10:33:08.183 INFO  MemoryStore - Block broadcast_0 stored as values in memory (estimated size 268.7 KiB, free 1076.2 GiB)
10:33:08.581 INFO  MemoryStore - Block broadcast_0_piece0 stored as bytes in memory (estimated size 41.8 KiB, free 1076.2 GiB)
10:33:08.585 INFO  BlockManagerInfo - Added broadcast_0_piece0 in memory on 172.20.19.130:43279 (size: 41.8 KiB, free: 1076.2 GiB)
10:33:08.591 INFO  SparkContext - Created broadcast 0 from newAPIHadoopFile at PathSplitSource.java:96
10:33:09.126 INFO  MemoryStore - Block broadcast_1 stored as values in memory (estimated size 268.7 KiB, free 1076.2 GiB)
10:33:09.142 INFO  MemoryStore - Block broadcast_1_piece0 stored as bytes in memory (estimated size 41.8 KiB, free 1076.2 GiB)
10:33:09.144 INFO  BlockManagerInfo - Added broadcast_1_piece0 in memory on 172.20.19.130:43279 (size: 41.8 KiB, free: 1076.2 GiB)
10:33:09.145 INFO  SparkContext - Created broadcast 1 from newAPIHadoopFile at PathSplitSource.java:96
10:33:09.336 INFO  SortSamSpark - Using 44262 reducers
10:33:09.615 INFO  FileInputFormat - Total input files to process : 4791
10:33:09.793 INFO  SparkContext - Starting job: sortByKey at SparkUtils.java:165
10:33:09.849 INFO  DAGScheduler - Got job 0 (sortByKey at SparkUtils.java:165) with 15769 output partitions
10:33:09.850 INFO  DAGScheduler - Final stage: ResultStage 0 (sortByKey at SparkUtils.java:165)
10:33:09.850 INFO  DAGScheduler - Parents of final stage: List()
10:33:09.862 INFO  DAGScheduler - Missing parents: List()
10:33:09.869 INFO  DAGScheduler - Submitting ResultStage 0 (MapPartitionsRDD[16] at sortByKey at SparkUtils.java:165), which has no missing parents
10:33:10.193 INFO  MemoryStore - Block broadcast_2 stored as values in memory (estimated size 571.5 KiB, free 1076.2 GiB)
10:33:10.207 INFO  MemoryStore - Block broadcast_2_piece0 stored as bytes in memory (estimated size 214.8 KiB, free 1076.2 GiB)
10:33:10.208 INFO  BlockManagerInfo - Added broadcast_2_piece0 in memory on 172.20.19.130:43279 (size: 214.8 KiB, free: 1076.2 GiB)
10:33:10.209 INFO  SparkContext - Created broadcast 2 from broadcast at DAGScheduler.scala:1513
10:33:10.249 INFO  DAGScheduler - Submitting 15769 missing tasks from ResultStage 0 (MapPartitionsRDD[16] at sortByKey at SparkUtils.java:165) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
10:33:10.253 INFO  TaskSchedulerImpl - Adding task set 0.0 with 15769 tasks resource profile 0
gokalpcelik commented 3 months ago

Do you really need almost 2 terabytes of heap space? -Xmx1794G This is probably what is killing your process. Besides we don't recommend using experimental spark tools for production or research purposes unless we say that it is fine to do so.

fo40225 commented 3 months ago

You've misunderstood the issue. My computer has 2TB of memory, so -Xmx1794G is not the cause of the problem. When using the original BAM file (447GB) as input, SortSamSpark runs successfully. However, when using the BAM file filtered with samtools view -e 'length(seq)>=10000' (434GB) as input, SortSamSpark crashes. The file used is a test file, not for production or research purposes. I'm reporting this issue in the hope of improving GATK.

jonn-smith commented 2 months ago

@fo40225 that's interesting. The original file contains the reads that cause the filtered file to fail? I would have expected it to fail in both cases if it was an issue with the read lengths.

@gokalpcelik is correct about the Spark tools - they're Beta / Experimental tools, so we don't expect them to be stable on all inputs. You're probably running into an edge case we haven't seen before.

lbergelson commented 2 months ago

This is definitely a bug with the way serialization is handled, but it's hard to tell where the issue is exactly. Spark is trying to serialize something into a byte buffer, but it's trying to put more bytes in than fit in a java array. If you could produce a very small bam file that reliably reproduces this problem we might be able to investigate it, but I don't have bandwidth to really look into this right now. Spark tools are a low priority at the moment. I would recommend sorting the file with the non-spark SortSam for now. I'm sorry I don't have a better answer, but dealing with serialization issues is very often a huge can of worms.

fo40225 commented 2 months ago

@jonn-smith The original BAM (containing short reads) will run normally. The filtered BAM (containing only long reads) will crash.

@lbergelson Is there a way to keep the file in --conf spark.local.dir=./tmp? Perhaps I can extract a minimal bam file that reliably reproduces this problem from it.