broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 594 forks source link

VariantAnnotator throws exception on multiallelic variant #6689

Open fleharty opened 4 years ago

fleharty commented 4 years ago

Bug Report

Affected tool(s) or class(es)

VariantAnnotator

Affected version(s)

Description

Throws an exception on a legal variant.

java.lang.IllegalStateException: Allele in genotype G not in the variant context [G*, G, GT] at htsjdk.variant.variantcontext.VariantContext$Validation.validateGenotypes(VariantContext.java:382) at htsjdk.variant.variantcontext.VariantContext$Validation.access$200(VariantContext.java:323) at htsjdk.variant.variantcontext.VariantContext$Validation$2.validate(VariantContext.java:331) at htsjdk.variant.variantcontext.VariantContext.lambda$validate$0(VariantContext.java:1384) at java.lang.Iterable.forEach(Iterable.java:75) at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1384) at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:489) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638) at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.trimAlleles(GATKVariantContextUtils.java:1329) at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.trimAlleles(GATKVariantContextUtils.java:1285) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.getMinRepresentationBiallelics(VariantAnnotatorEngine.java:499) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:440) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:285) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:230) at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206) at org.broadinstitute.hellbender.Main.main(Main.java:292)

I realize this is a open source project. But I've made copy of the failing VCF available at: /dsde/working/fleharty/tmp/buggy.vcf /dsde/working/fleharty/tmp/buggy.vcf.idx

Steps to reproduce

gatk VariantAnnotator -V buggy.vcf --resource:gnomad af-only-gnomad.raw.sites.vcf -E gnomad.AF --resource-allele-concordance -O gnomad_annotated.vcf

Expected behavior

Should work

Actual behavior

Throws exception

ldgauthier commented 3 years ago

This is probably complicated by a bug in the htsjdk warning from previous versions, which should be fixed in the latest master now. There's probably still a bug, but the error will be more informative now.

There may be a ploidy-related bug since the somatic genotypes are a little funky that way. I don't like the fact that this is calling a biallelic method.

@fleharty if you still care about this, can you run it again with the latest master?

Riad900 commented 2 years ago

Hi everyone. I try to run gatk 4.2.5.0 VariantAnnotator using gnomAD data. However I get this error message java.lang.IllegalStateException: Allele in genotype C not in the variant context [C*, CT] can you maybe advise whats going on?

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx30G -jar /run/media/riadh/One Touch1/Analysis/gatk-4.2.4.1/gatk-package-4.2.5.0-local.jar VariantAnnotator -V PE69_chr3.vcf -R /run/media/riadh/One Touch/Reference_data_b38/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta --resource:gnomad /run/media/riadh/One Touch/Reference_data_b38/gnomad.genomes.v3.1.2.sites.chr3.vcf.bgz -E gnomad.nhomalt -E gnomad.ALT -E gnomad.AF -O PE69_ch3_vep_cadd_gnomad.vcf --resource-allele-concordance 10:58:19.715 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/run/media/riadh/One%20Touch1/Analysis/gatk-4.2.4.1/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so Mar 17, 2022 10:58:19 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine INFO: Failed to detect whether we are running on Google Compute Engine. 10:58:19.796 INFO VariantAnnotator - ------------------------------------------------------------ 10:58:19.796 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.2.5.0 10:58:19.796 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/ 10:58:19.797 INFO VariantAnnotator - Executing as riadh@ikm-unix-1012.uio.no on Linux v5.16.12-200.fc35.x86_64 amd64 10:58:19.797 INFO VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v11.0.14.1+1 10:58:19.797 INFO VariantAnnotator - Start Date/Time: March 17, 2022 at 10:58:19 AM CET 10:58:19.797 INFO VariantAnnotator - ------------------------------------------------------------ 10:58:19.797 INFO VariantAnnotator - ------------------------------------------------------------ 10:58:19.797 INFO VariantAnnotator - HTSJDK Version: 2.24.1 10:58:19.797 INFO VariantAnnotator - Picard Version: 2.25.4 10:58:19.798 INFO VariantAnnotator - Built for Spark Version: 2.4.5 10:58:19.798 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2 10:58:19.798 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 10:58:19.798 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 10:58:19.798 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 10:58:19.798 INFO VariantAnnotator - Deflater: IntelDeflater 10:58:19.798 INFO VariantAnnotator - Inflater: IntelInflater 10:58:19.798 INFO VariantAnnotator - GCS max retries/reopens: 20 10:58:19.798 INFO VariantAnnotator - Requester pays: disabled 10:58:19.798 INFO VariantAnnotator - Initializing engine 10:58:19.942 INFO FeatureManager - Using codec VCFCodec to read file file:///run/media/riadh/One%20Touch/Reference_data_b38/gnomad.genomes.v3.1.2.sites.chr3.vcf.bgz 10:58:19.971 INFO FeatureManager - Using codec VCFCodec to read file file:///run/media/riadh/My%20Book_From%20Eiklid/Analysis/gatk-4.2.4.1/ensembl-vep/PE69_chr3.vcf 10:58:20.063 INFO VariantAnnotator - Done initializing engine 10:58:20.091 WARN VariantAnnotatorEngine - The requested expression attribute "gnomad.ALT" is missing from the header in its resource file gnomad 10:58:20.140 INFO ProgressMeter - Starting traversal 10:58:20.140 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 10:58:42.160 INFO VariantAnnotator - Shutting down engine [March 17, 2022 at 10:58:42 AM CET] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.37 minutes. Runtime.totalMemory()=17158897664 java.lang.IllegalStateException: Allele in genotype C not in the variant context [C*, CT] at htsjdk.variant.variantcontext.VariantContext$Validation.validateGenotypes(VariantContext.java:382) at htsjdk.variant.variantcontext.VariantContext$Validation.access$200(VariantContext.java:323) at htsjdk.variant.variantcontext.VariantContext$Validation$2.validate(VariantContext.java:331) at htsjdk.variant.variantcontext.VariantContext.lambda$validate$0(VariantContext.java:1384) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1384) at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:489) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638) at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.trimAlleles(GATKVariantContextUtils.java:1464) at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.trimAlleles(GATKVariantContextUtils.java:1420) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.getMinRepresentationBiallelics(VariantAnnotatorEngine.java:568) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:509) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.addInfoAnnotations(VariantAnnotatorEngine.java:347) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:306) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:243) at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

tmelman commented 1 year ago

Update: this issue is still happening. User ran GATK 4.4: https://gatk.broadinstitute.org/hc/en-us/community/posts/15706942393371-Error-when-running-VariantAnnotator

Here is a PR to deploy a bugfix for a similar issue in HaplotypeCaller. https://github.com/broadinstitute/gatk/pull/5365