broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.62k stars 575 forks source link

VariantAnnotator IndexOutOfBoundsException #8800

Open ykcchong opened 1 month ago

ykcchong commented 1 month ago

Bug Report

Affected tool(s) or class(es)

VariantAnnotator

Affected version(s)

Description

Using GATK jar /directory_masked/programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /directory_masked/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar VariantAnnotator -I ../test.bam -V test.vcf -O test_2.vcf --reference /directory_masked/refs/hg19/ucsc.hg19.fasta --enable-all-annotations true -jdk-deflater true -jdk-inflater true
14:02:45.344 INFO  VariantAnnotator - ------------------------------------------------------------
14:02:45.346 INFO  VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.5.0.0
14:02:45.346 INFO  VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
14:02:45.346 INFO  VariantAnnotator - Executing as username@hostname.local on Mac OS X v14.2 aarch64
14:02:45.346 INFO  VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v17.0.10+0
14:02:45.346 INFO  VariantAnnotator - Start Date/Time: April 30, 2024 at 2:02:45 PM HKT
14:02:45.346 INFO  VariantAnnotator - ------------------------------------------------------------
14:02:45.346 INFO  VariantAnnotator - ------------------------------------------------------------
14:02:45.347 INFO  VariantAnnotator - HTSJDK Version: 4.1.0
14:02:45.347 INFO  VariantAnnotator - Picard Version: 3.1.1
14:02:45.347 INFO  VariantAnnotator - Built for Spark Version: 3.5.0
14:02:45.348 INFO  VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:02:45.348 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:02:45.348 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:02:45.348 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:02:45.349 INFO  VariantAnnotator - Deflater: JdkDeflater
14:02:45.349 INFO  VariantAnnotator - Inflater: JdkInflater
14:02:45.349 INFO  VariantAnnotator - GCS max retries/reopens: 20
14:02:45.349 INFO  VariantAnnotator - Requester pays: disabled
14:02:45.349 INFO  VariantAnnotator - Initializing engine
14:02:45.425 INFO  FeatureManager - Using codec VCFCodec to read file file:///directory_masked/test.vcf
14:02:45.436 INFO  VariantAnnotator - Done initializing engine
14:02:45.459 INFO  ProgressMeter - Starting traversal
14:02:45.459 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
14:02:45.498 WARN  VariantAnnotatorEngine - Jumbo genotype annotations requested but fragment likelihoods or haplotype likelihoods were not given.
14:02:45.505 INFO  VariantAnnotator - Shutting down engine
[April 30, 2024 at 2:02:45 PM HKT] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=285212672
java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
        at java.base/java.util.Objects.checkIndex(Objects.java:361)
        at java.base/java.util.ArrayList.get(ArrayList.java:427)
        at java.base/java.util.Collections$UnmodifiableList.get(Collections.java:1347)
        at org.broadinstitute.hellbender.tools.walkers.annotator.AllelePseudoDepth$1.visit(AllelePseudoDepth.java:119)
        at org.apache.commons.math3.linear.Array2DRowRealMatrix.walkInRowOrder(Array2DRowRealMatrix.java:400)
        at org.apache.commons.math3.linear.AbstractRealMatrix.walkInOptimizedOrder(AbstractRealMatrix.java:879)
        at org.broadinstitute.hellbender.tools.walkers.annotator.AllelePseudoDepth.composeInputLikelihoodMatrix(AllelePseudoDepth.java:122)
        at org.broadinstitute.hellbender.tools.walkers.annotator.AllelePseudoDepth.annotate(AllelePseudoDepth.java:93)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.lambda$annotateGenotypes$6(VariantAnnotatorEngine.java:427)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateGenotypes(VariantAnnotatorEngine.java:427)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:360)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:243)
        at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
        at org.broadinstitute.hellbender.Main.main(Main.java:306)

Steps to reproduce

When the command is run with original (haplotypecaller output, left aligned and trimmed, with a number of variants), the program crashes and prematurely terminate the output.

The problem can be isolated to one variant with the bam file.

gatk VariantAnnotator -I ../test.bam -V test.vcf -O test_2.vcf --reference ~/refs/hg19/ucsc.hg19.fasta --enable-all-annotations true -jdk-deflater true -jdk-inflater true

test.vcf is a haplotypecaller + leftalignandtrimvariant vcf file with one single variant:

chr8    145743102       .       C       A       37.32   .       AC=2;AF=1.00;AN=2;DP=2;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=18.66;SOR=0.693        GT:AD:DP:GQ:PL  1/1:0,2:2:6:49,6,0

test.bam is a hg19-aligned, duplicate-marked bam file (372kb, containing only reads associated with the site, can be sent privately if necessary)

troubleshooting steps done:

Expected behavior

error/warning message or no annotation generated for a variant

Actual behavior

crashed with outputs truncated (where original, large number of variant in a file was analyzed)