broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 588 forks source link

FilterMutectCalls in 4.2.0.0 has an error with AS_UNIQ_ALT_READ_COUNT annotation format #7298

Open GATKSupportTeam opened 3 years ago

GATKSupportTeam commented 3 years ago

It looks like this is a bug with 4.2.0.0 because the same Mutect2 output has no issues with FilterMutectCalls 4.1.6.0.

This request was created from a contribution made by Qihan Long on June 04, 2021 03:21 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360078236332-The-output-of-Mutect2-cannot-be-accepted-by-FilterMutectCalls-in-GATK-4-2-0-0

--

If you are seeing an error, please provide(REQUIRED) :

a) GATK version used: 4.2.0.0

b) Exact command used: 

gatk FilterMutectCalls \

-R /public1/data/resources/ref_genome/GRCh38/GRCh38.d1.vd1.fa \

-V somatic_mutation/Mutect2/test.vcf.gz \

-O somatic_mutation/FilterMutectCalls/test.vcf.gz

c) Entire error log:

I used the "--enable-all-annotations" option within Mutect2 to get a vcf file with abundant information. However, the following FilterMutectCalls step seemed to be intolerant of some information within previous step's vcf file record.

The intolerated record within vcf listed below: 

chr1 6197724 . C CT,CTT,CTTT . . AC=1,1,1;AF=0.167,0.167,0.167;AN=6;AS_BaseQRankSum=-6.431;AS_MQ=60.00,60.00,60.00;AS_MQRankSum=0.000;AS_ReadPosRankSum=5.751;AS_SB_TABLE=42,880|3,164|3,32|0,14;AS_UNIQ_ALT_READ_COUNT=167|35|14;BQHIST=5,1,0,0,1,11,2,0,0,0,14,2,0,0,1,15,1,0,0,0,16,1,0,0,0,17,0,2,0,0,18,2,0,0,1,19,6,0,1,0,20,25,0,2,2,21,13,0,1,2,22,20,0,3,3,23,2,1,2,1,24,6,0,2,0,25,21,1,4,7,26,33,0,5,6,27,18,0,0,7,28,29,0,0,4,29,26,2,4,8,30,161,4,5,51,31,263,2,3,51,32,129,2,3,22,33,41,0,0,0,34,15,0,0,0,35,20,0,0,0,36,19,0,0,0,37,12,0,0,0,38,1,0,0,0,39,9,0,0,0,41,18,0,0,0,44,26,0,0,0;BaseQRankSum=-6.431;ClippingRankSum=-7.714;DP=1323;ECNT=1;FS=0.000;LikelihoodRankSum=-7.886;MBQ=31,30,26,30;MFRL=6590,6585,4819,6586;MMQ=60,60,60,60;MPOS=16,15,7;MQ=59.98;MQ0=0;MQRankSum=0.000;NALOD=0.569,1.49,1.49;NCC=0;NCount=0;NLOD=27.80,30.51,30.97;OCM=0;POPAF=6.00,6.00,6.00;REF_BASES=GAACTTGCTTCTTTTTTTTGC;RPA=8,9,10,11;RU=T;ReadPosRankSum=5.751;SOR=1.152;STR;Samples=TCGA-NJ-A55R-01A-11R-A262-07;TLOD=284.47,51.82,3.50 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2/3:819,166,35,14:0.161,0.034,0.014:1034:365,76,17,4:440,87,17,8:16,803,6,209 0/0:103,1,0,0:0.017,8.250e-03,8.221e-03:104:50,1,0,0:52,0,0,0:26,77,0,1

The error log that FilterMutectCalls emited was listed below:

Using GATK jar /home/lqh/software/GATK-4.2.0.0/gatk-package-4.2.0.0-local.jar

Running:

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/lqh/software/GATK-4.2.0.0/gatk-package-4.2.0.0-local.jar FilterMutectCalls -R /public1/data/resources/ref_genome/GRCh38/GRCh38.d1.vd1.fa -V somatic_mutation/Mutect2/test.vcf.gz -O somatic_mutation/FilterMutectCalls/test.vcf.gz

11:03:39.517 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/lqh/software/GATK-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

Jun 04, 2021 11:03:49 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

INFO: Failed to detect whether we are running on Google Compute Engine.

11:03:49.968 INFO FilterMutectCalls - ------------------------------------------------------------

11:03:49.969 INFO FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.2.0.0

11:03:49.969 INFO FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/

11:03:49.969 INFO FilterMutectCalls - Executing as lqh@master on Linux v5.6.14-1.el7.elrepo.x86_64 amd64

11:03:49.969 INFO FilterMutectCalls - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_152-b16

11:03:49.969 INFO FilterMutectCalls - Start Date/Time: June 4, 2021 11:03:39 AM CST

11:03:49.969 INFO FilterMutectCalls - ------------------------------------------------------------

11:03:49.969 INFO FilterMutectCalls - ------------------------------------------------------------

11:03:49.970 INFO FilterMutectCalls - HTSJDK Version: 2.24.0

11:03:49.970 INFO FilterMutectCalls - Picard Version: 2.25.0

11:03:49.970 INFO FilterMutectCalls - Built for Spark Version: 2.4.5

11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2

11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

11:03:49.970 INFO FilterMutectCalls - Deflater: IntelDeflater

11:03:49.971 INFO FilterMutectCalls - Inflater: IntelInflater

11:03:49.971 INFO FilterMutectCalls - GCS max retries/reopens: 20

11:03:49.971 INFO FilterMutectCalls - Requester pays: disabled

11:03:49.971 INFO FilterMutectCalls - Initializing engine

11:03:50.504 INFO FeatureManager - Using codec VCFCodec to read file file:///home/lqh/somatic_mutation/Mutect2/test.vcf.gz

11:03:50.696 INFO FilterMutectCalls - Done initializing engine

11:03:50.840 INFO ProgressMeter - Starting traversal

11:03:50.840 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute

11:03:50.841 INFO FilterMutectCalls - Starting pass 0 through the variants

11:03:51.014 INFO FilterMutectCalls - Shutting down engine

[June 4, 2021 11:03:51 AM CST] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.19 minutes.

Runtime.totalMemory()=625999872

java.lang.NumberFormatException: For input string: "167|35|14"

at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

at java.lang.Integer.parseInt(Integer.java:580)

at java.lang.Integer.valueOf(Integer.java:766)

at htsjdk.variant.variantcontext.CommonInfo.lambda$getAttributeAsIntList$1(CommonInfo.java:288)

at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)

at java.util.Collections$2.tryAdvance(Collections.java:4717)

at java.util.Collections$2.forEachRemaining(Collections.java:4725)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)

at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)

at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)

at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)

at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsList(CommonInfo.java:274)

at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsIntList(CommonInfo.java:282)

at htsjdk.variant.variantcontext.VariantContext.getAttributeAsIntList(VariantContext.java:827)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.DuplicatedAltReadFilter.areAllelesArtifacts(DuplicatedAltReadFilter.java:26)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.HardAlleleFilter.calculateErrorProbabilityForAlleles(HardAlleleFilter.java:16)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2AlleleFilter.errorProbabilities(Mutect2AlleleFilter.java:86)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.lambda$new$0(ErrorProbabilities.java:27)

at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)

at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)

at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)

at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)

at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)

at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.(ErrorProbabilities.java:25)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.accumulateData(Mutect2FilteringEngine.java:138)

at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls.nthPassApply(FilterMutectCalls.java:154)

at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverse$0(MultiplePassVariantWalker.java:40)

at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverseVariants$1(MultiplePassVariantWalker.java:77)

at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)

at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)

at java.util.Iterator.forEachRemaining(Iterator.java:116)

at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)

at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)

at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)

at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)

at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)

at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverseVariants(MultiplePassVariantWalker.java:75)

at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:40)

at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)

at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)

at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

at org.broadinstitute.hellbender.Main.main(Main.java:289)

What's worth noted is that I tested FilterMutectCalls within GATK-4.1.6.0, it surly can accept the vcf record above and output normal records as below: 

chr1 6197724 . C CT,CTT,CTTT . multiallelic AC=1,1,1;AF=0.167,0.167,0.167;AN=6;AS_BaseQRankSum=-6.431;AS_MQ=60.00,60.00,60.00;AS_MQRankSum=0.000;AS_ReadPosRankSum=5.751;AS_SB_TABLE=42,880|3,164|3,32|0,14;AS_UNIQ_ALT_READ_COUNT=167|35|14;BQHIST=5,1,0,0,1,11,2,0,0,0,14,2,0,0,1,15,1,0,0,0,16,1,0,0,0,17,0,2,0,0,18,2,0,0,1,19,6,0,1,0,20,25,0,2,2,21,13,0,1,2,22,20,0,3,3,23,2,1,2,1,24,6,0,2,0,25,21,1,4,7,26,33,0,5,6,27,18,0,0,7,28,29,0,0,4,29,26,2,4,8,30,161,4,5,51,31,263,2,3,51,32,129,2,3,22,33,41,0,0,0,34,15,0,0,0,35,20,0,0,0,36,19,0,0,0,37,12,0,0,0,38,1,0,0,0,39,9,0,0,0,41,18,0,0,0,44,26,0,0,0;BaseQRankSum=-6.431;CONTQ=93;ClippingRankSum=-7.714;DP=1323;ECNT=1;FS=0.000;GERMQ=93;LikelihoodRankSum=-7.886;MBQ=31,30,26,30;MFRL=6590,6585,4819,6586;MMQ=60,60,60,60;MPOS=16,15,7;MQ=59.98;MQ0=0;MQRankSum=0.000;NALOD=0.569,1.49,1.49;NCC=0;NCount=0;NLOD=27.80,30.51,30.97;OCM=0;POPAF=6.00,6.00,6.00;REF_BASES=GAACTTGCTTCTTTTTTTTGC;RPA=8,9,10,11;RU=T;ReadPosRankSum=5.751;SEQQ=93;SOR=1.152;STR;STRANDQ=93;STRQ=21;Samples=TCGA-NJ-A55R-01A-11R-A262-07;TLOD=284.47,51.82,3.50 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2/3:819,166,35,14:0.161,0.034,0.014:1034:365,76,17,4:440,87,17,8:16,803,6,209 0/0:103,1,0,0:0.017,8.250e-03,8.221e-03:104:50,1,0,0:52,0,0,0:26,77,0,1

Much appreciated for your precious time to inspect my report, thanks!

(created from Zendesk ticket #163431)
gz#163431

gz#276553

(related to Zendesk ticket #276553)

gbrandt6 commented 3 years ago

The user found another issue with Mutect2 in GATK 4.2.0.0: the AS_MQ annotation contains a comma and cannot be parsed by other tools. I have recommended that the user work around this by leaving out this annotation but we should continue to look at a fix for these issues in GATK 4.2.0.0

fleharty commented 3 years ago

@davidbenjamin Could you take a look into this, sounds like an easy fix.

gbrandt6 commented 2 years ago

There is another issue with how this annotation interacts with germline calling, specifically an issue coming up in GenomicsDBImport when users add all annotations to their GVCF from HaplotypeCaller. Forum post: https://gatk.broadinstitute.org/hc/en-us/community/posts/4659308628891-GenomicsDBImport-error-from-fields-in-AS-UNIQ-ALT-READ-COUNT