broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

HaplotypeCallerSpark Failed because GenotypesCache thrown NPE #8961

Open Xuehai-Chen opened 3 months ago

Xuehai-Chen commented 3 months ago

Bug Report

Affected tool(s) or class(es)

HaplotypeCallerSpark

Affected version(s)

Description

spark task failed, here is the stack trace:

java.lang.NullPointerException: Cannot invoke "java.util.List.size()" because "cache" is null
    at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypesCache.ensureCapacity(GenotypesCache.java:84)
    at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypesCache.get(GenotypesCache.java:43)
    at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.makeGenotypeCall(GATKVariantContextUtils.java:341)
    at org.broadinstitute.hellbender.tools.walkers.genotyper.AlleleSubsettingUtils.subsetAlleles(AlleleSubsettingUtils.java:133)
    at org.broadinstitute.hellbender.tools.walkers.genotyper.AlleleSubsettingUtils.subsetAlleles(AlleleSubsettingUtils.java:48)
    at org.broadinstitute.hellbender.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:191)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:263)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:979)
    at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$assemblyFunction$0(HaplotypeCallerSpark.java:179)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
    at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1856)
    at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292)
    at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
    at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
    at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298)
    at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
    at org.broadinstitute.hellbender.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1379)
    at org.broadinstitute.hellbender.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1395)
    at org.broadinstitute.hellbender.utils.iterators.PushToPullIterator.fillCache(PushToPullIterator.java:71)
    at org.broadinstitute.hellbender.utils.iterators.PushToPullIterator.advanceToNextElement(PushToPullIterator.java:58)
    at org.broadinstitute.hellbender.utils.iterators.PushToPullIterator.(PushToPullIterator.java:37)
    at org.broadinstitute.hellbender.utils.variant.writers.GVCFBlockCombiningIterator.(GVCFBlockCombiningIterator.java:14)
    at org.broadinstitute.hellbender.engine.spark.datasources.VariantsSparkSink.lambda$writeVariantsSingle$516343c4$1(VariantsSparkSink.java:127)
    at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions$1(JavaRDDLike.scala:153)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:56)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:56)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:56)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:93)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Steps to reproduce

Run HaplotypeCallerSpark multiple times, it had a chance to fail. Looks like the method ensureCapacity of GenotypesCache is not synchronized. So when multiple task threads run into this method, the new added cache is not fully initialized.

Expected behavior

spark tasks success

Actual behavior

spark tasks failed

Xuehai-Chen commented 3 months ago

I think the root cause is the method ensureCapacity of GenotypesCache is not synchronized. So when multiple task threads run into this method, the new added cache is not fully initialized.

gokalpcelik commented 3 months ago

Hi @Xuehai-Chen HaplotypeCallerSpark is not developed regularly as the original HaplotypeCaller therefore it has its own quirks and issues present. It can be considered as an experimental tool/a conceptual tool to show that HaplotypeCaller may be accelerated using spark. It is not endorsed as a ready to be used tool for any purpose. Its development is not a high priority therefore we don't recommend using it.