broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 588 forks source link

FilterFuncotations Error, ShouldNeverReachHereException, FuncotationMap #7865

Open GATKSupportTeam opened 2 years ago

GATKSupportTeam commented 2 years ago

This request was created from a contribution made by Joyce Anon on April 25, 2022 06:30 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/5573282748699-Error-ShouldNeverReachHereException-FuncotationMap-in-FilterFuncotations

--

FilterFuncotations stops with an error. The input file with the reference genome seems to pass ValidateVariants (no errors). It looks like "FuncotationMap" doesn't have enough values to go with the keys. I started with a .vcf file downloaded from Nebula Genomics, and sequentially used CNNScoreVariants, FilterVariantTranches (CNN_1D), and Funcotator, with default settings.

I am trying to find the most pathogenic variants. I considered using FilterVcf to remove synonymous and intron variants, but it doesn't look like it can do that. So then I tried FilterFuncotations, but it returns an error. What I want is some way to sort the variants by severity, to find the most pathogenic ones, but I don't know how to do that.

GATK version: 4.2.6.1

Java runtime: OpenJDK 64-Bit Server VM v11.0.14.1+1-Ubuntu-0ubuntu1.20.04

Excerpt:

[April 25, 2022 at 2:00:35 AM EDT] org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations done. Elapsed time: 0.03 minutes.

Runtime.totalMemory()=319815680

org.broadinstitute.hellbender.exceptions.GATKException$ShouldNeverReachHereException: Cannot parse the funcotation attribute.  Num values: 31   Num keys: 53

Copied from the terminal:

(gatk) aru@BioinformaticsVM:/mnt/sdb/gatk$ ./gatk FilterFuncotations --allele-frequency-data-source gnomad -O ./output/nebulaFilterFuncotations.vcf --ref-version hg38 -V ./output/nebulaFuncotatorAnnotated.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

Using GATK jar /mnt/sdb/gatk/gatk-package-4.2.6.1-local.jar

Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /mnt/sdb/gatk/gatk-package-4.2.6.1-local.jar FilterFuncotations --allele-frequency-data-source gnomad -O ./output/nebulaFilterFuncotations.vcf --ref-version hg38 -V ./output/nebulaFuncotatorAnnotated.vcf

02:00:34.173 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/sdb/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

02:00:34.368 INFO  FilterFuncotations - ------------------------------------------------------------

02:00:34.369 INFO  FilterFuncotations - The Genome Analysis Toolkit (GATK) v4.2.6.1

02:00:34.369 INFO  FilterFuncotations - For support and documentation go to https://software.broadinstitute.org/gatk/

02:00:34.369 INFO  FilterFuncotations - Executing as aru@BioinformaticsVM on Linux v5.13.0-39-generic amd64

02:00:34.369 INFO  FilterFuncotations - Java runtime: OpenJDK 64-Bit Server VM v11.0.14.1+1-Ubuntu-0ubuntu1.20.04

02:00:34.369 INFO  FilterFuncotations - Start Date/Time: April 25, 2022 at 2:00:34 AM EDT

02:00:34.369 INFO  FilterFuncotations - ------------------------------------------------------------

02:00:34.369 INFO  FilterFuncotations - ------------------------------------------------------------

02:00:34.370 INFO  FilterFuncotations - HTSJDK Version: 2.24.1

02:00:34.371 INFO  FilterFuncotations - Picard Version: 2.27.1

02:00:34.371 INFO  FilterFuncotations - Built for Spark Version: 2.4.5

02:00:34.371 INFO  FilterFuncotations - HTSJDK Defaults.COMPRESSION_LEVEL : 2

02:00:34.371 INFO  FilterFuncotations - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

02:00:34.371 INFO  FilterFuncotations - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

02:00:34.371 INFO  FilterFuncotations - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

02:00:34.371 INFO  FilterFuncotations - Deflater: IntelDeflater

02:00:34.371 INFO  FilterFuncotations - Inflater: IntelInflater

02:00:34.371 INFO  FilterFuncotations - GCS max retries/reopens: 20

02:00:34.371 INFO  FilterFuncotations - Requester pays: disabled

02:00:34.372 WARN  FilterFuncotations - 

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: FilterFuncotations is an EXPERIMENTAL tool and should not be used for production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

02:00:34.372 INFO  FilterFuncotations - Initializing engine

02:00:34.518 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/sdb/gatk/./output/nebulaFuncotatorAnnotated.vcf

02:00:34.815 INFO  FilterFuncotations - Done initializing engine

02:00:35.260 INFO  ProgressMeter - Starting traversal

02:00:35.261 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute

02:00:35.262 INFO  FilterFuncotations - Starting pass 0 through the variants

02:00:35.778 ERROR FuncotationMap - Keys:  Gencode_34_hugoSymbol, Gencode_34_ncbiBuild, Gencode_34_chromosome, Gencode_34_start, Gencode_34_end, Gencode_34_variantClassification, Gencode_34_secondaryVariantClassification, Gencode_34_variantType, Gencode_34_refAllele, Gencode_34_tumorSeqAllele1, Gencode_34_tumorSeqAllele2, Gencode_34_genomeChange, Gencode_34_annotationTranscript, Gencode_34_transcriptStrand, Gencode_34_transcriptExon, Gencode_34_transcriptPos, Gencode_34_cDnaChange, Gencode_34_codonChange, Gencode_34_proteinChange, Gencode_34_gcContent, Gencode_34_referenceContext, Gencode_34_otherTranscripts, ACMGLMMLof_LOF_Mechanism, ACMGLMMLof_Mode_of_Inheritance, ACMGLMMLof_Notes, ACMG_recommendation_Disease_Name, ClinVar_VCF_AF_ESP, ClinVar_VCF_AF_EXAC, ClinVar_VCF_AF_TGP, ClinVar_VCF_ALLELEID, ClinVar_VCF_CLNDISDB, ClinVar_VCF_CLNDISDBINCL, ClinVar_VCF_CLNDN, ClinVar_VCF_CLNDNINCL, ClinVar_VCF_CLNHGVS, ClinVar_VCF_CLNREVSTAT, ClinVar_VCF_CLNSIG, ClinVar_VCF_CLNSIGCONF, ClinVar_VCF_CLNSIGINCL, ClinVar_VCF_CLNVC, ClinVar_VCF_CLNVCSO, ClinVar_VCF_CLNVI, ClinVar_VCF_DBVARID, ClinVar_VCF_GENEINFO, ClinVar_VCF_MC, ClinVar_VCF_ORIGIN, ClinVar_VCF_RS, ClinVar_VCF_SSR, ClinVar_VCF_ID, ClinVar_VCF_FILTER, LMMKnown_LMM_FLAGGED, LMMKnown_ID, LMMKnown_FILTER

02:00:35.778 ERROR FuncotationMap - Values:  , , , , , , , , , , , , , , , , , , , , , , , , , , , , false, , 

02:00:35.793 INFO  FilterFuncotations - Shutting down engine

[April 25, 2022 at 2:00:35 AM EDT] org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations done. Elapsed time: 0.03 minutes.

Runtime.totalMemory()=319815680

org.broadinstitute.hellbender.exceptions.GATKException$ShouldNeverReachHereException: Cannot parse the funcotation attribute.  Num values: 31   Num keys: 53

    at org.broadinstitute.hellbender.tools.funcotator.FuncotationMap.createAsAllTableFuncotationsFromVcf(FuncotationMap.java:224)

    at org.broadinstitute.hellbender.tools.funcotator.FuncotatorUtils.lambda$createAlleleToFuncotationMapFromFuncotationVcfAttribute$5(FuncotatorUtils.java:2256)

    at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:178)

    at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)

    at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)

    at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)

    at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:699)

    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)

    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)

    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)

    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)

    at org.broadinstitute.hellbender.tools.funcotator.FuncotatorUtils.createAlleleToFuncotationMapFromFuncotationVcfAttribute(FuncotatorUtils.java:2255)

    at org.broadinstitute.hellbender.tools.funcotator.filtrationRules.ArHetvarFilter.buildArHetByGene(ArHetvarFilter.java:77)

    at org.broadinstitute.hellbender.tools.funcotator.filtrationRules.ArHetvarFilter.firstPassApply(ArHetvarFilter.java:50)

    at org.broadinstitute.hellbender.tools.funcotator.FilterFuncotations.firstPassApply(FilterFuncotations.java:161)

    at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.nthPassApply(TwoPassVariantWalker.java:17)

    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverse$0(MultiplePassVariantWalker.java:40)

    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverseVariants$1(MultiplePassVariantWalker.java:77)

    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)

    at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)

    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)

    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)

    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)

    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)

    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)

    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)

    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)

    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverseVariants(MultiplePassVariantWalker.java:75)

    at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:40)

    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

(created from Zendesk ticket #282401)
gz#282401

droazen commented 2 years ago

@GATKSupportTeam Can you ask the user to provide an example FUNCOTATION attribute from their VCF? This error indicates that one or more of the FUNCOTATION attributes are malformed, so it would help to be able to inspect one.

gbrandt6 commented 2 years ago

Yes, I have asked the user to upload their data for testing.

droazen commented 2 years ago

@gbrandt6 Has the user uploaded their data yet?

gbrandt6 commented 2 years ago

Yes, I got the data today.

gbrandt6 commented 1 year ago

The user's uploaded data is named Joyce1NebulaFuncotatorAnnotated.vcf and Joyce1NebulaFuncotatorAnnotated.vcf.idx.