broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 591 forks source link

FilterMutectCalls when using --alleles yields java.lang.IllegalArgumentException: log10 p: Values must be non-infinite and non-NAN #7276

Open Ben-Habermeyer opened 3 years ago

Ben-Habermeyer commented 3 years ago

Bug Report

This is more of a question than an outright bug report. Is it expected that FilterMutectCalls should work on a VCF produced by Mutect2 run with the --alleles option? i.e. there exist homref calls in the mutect2 VCF?

When using the --alleles option to force calls at specific loci, an exception is raised on FilterMutectCalls. This issue was reported https://github.com/broadinstitute/gatk/issues/6237, and I have bumped GATK to several versions, including latest, but the issue persists. This exception is not raised when the --alleles flag is excluded.

Affected tool(s) or class(es)

Mutect2, FilterMutectCalls

Affected version(s)

Description

here is the stack trace

20:29:45.346 INFO  FilterMutectCalls - Done initializing engine
20:29:45.399 INFO  ProgressMeter - Starting traversal
20:29:45.399 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
20:29:45.400 INFO  FilterMutectCalls - Starting pass 0 through the variants
20:29:45.580 INFO  FilterMutectCalls - Finished pass 0 through the variants
20:29:45.630 INFO  FilterMutectCalls - Shutting down engine
[May 25, 2021 8:29:45 PM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=462946304
java.lang.IllegalArgumentException: log10 p: Values must be non-infinite and non-NAN
        at org.broadinstitute.hellbender.utils.NaturalLogUtils.logSumExp(NaturalLogUtils.java:84)
        at org.broadinstitute.hellbender.utils.NaturalLogUtils.normalizeLog(NaturalLogUtils.java:51)
        at org.broadinstitute.hellbender.utils.NaturalLogUtils.normalizeFromLogToLinearSpace(NaturalLogUtils.java:27)
        at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.posteriorProbabilityOfError(Mutect2FilteringEngine.java:93)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.probabilityOfSequencingError(SomaticClusteringModel.java:140)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.probabilityOfSomaticVariant(SomaticClusteringModel.java:146)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.performEMIteration(SomaticClusteringModel.java:345)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.learnAndClearAccumulatedData(SomaticClusteringModel.java:330)
        at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.learnParameters(Mutect2FilteringEngine.java:153)
        at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls.afterNthPass(FilterMutectCalls.java:165)
        at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:44)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Steps to reproduce

Here are the commands run. Can provide additional details if needed.

        gatk \
        --java-options "-Xmx32g" \
        Mutect2 \
        -R $ref_fasta \
        -I $in_bam \
        -L $region \
        -O $out_mutect2_vcf \
        --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
        --max-reads-per-alignment-start 0 \
        --native-pair-hmm-threads 16 \
        --alleles $roi

        gatk FilterMutectCalls \
        -R $ref_fasta \
        -V $out_mutect2_vcf \
        -O $out_mutect2_filtered_vcf

Expected behavior

FilterMutectCalls filters the mutect2 VCF.

Actual behavior

FilterMutectCalls fails with java.lang.IllegalArgumentException: log10 p: Values must be non-infinite and non-NAN

fleharty commented 3 years ago

@davidbenjamin Could you take a look at this?

Ben-Habermeyer commented 3 years ago

Any insight on this?

davidbenjamin commented 2 years ago

@Ben-Habermeyer We had a few PRs in late 2021 that may have fixed this. If it's still occurring in the latest GATK version I would like to take a look at it.

Ben-Habermeyer commented 2 years ago

@davidbenjamin Thank you for getting back to me. I will upgrade to the latest version in the next few weeks and let you know how it turns out!

Ben-Habermeyer commented 2 years ago

@davidbenjamin going to upgrade to latest this week and test it out FWIW with 4.1.1.0 this appears to be an issue for calls output from Mutect2 with very low DP.

E.g. the following VCF raises this exception for FilterMutectCalls

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NEC-2b
chr5    85458177    .   A   G   .   .   DP=1;ECNT=1;MBQ=0,90;MFRL=0,171;MMQ=60,60;MPOS=17;POPAF=7.30;TLOD=4.20  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,1:0,0:0,0,1,0
chr13   38648250    .   A   G   .   .   DP=2;ECNT=1;MBQ=0,37;MFRL=0,47;MMQ=60,60;MPOS=32;POPAF=7.30;TLOD=7.40   GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,2:0.750:2:0,2:0,0:0,0,2,0
chr16   10195584    .   C   T   .   .   DP=1;ECNT=1;MBQ=0,90;MFRL=0,159;MMQ=60,60;MPOS=32;POPAF=7.30;TLOD=4.20  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,1:0,0:0,0,1,0
chr16   54464154    .   G   A   .   .   DP=1;ECNT=1;MBQ=0,90;MFRL=0,145;MMQ=60,60;MPOS=15;POPAF=7.30;TLOD=4.20  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,1:0,0:0,0,0,1
chr18   10035436    .   C   T   .   .   DP=1;ECNT=1;MBQ=0,45;MFRL=0,131;MMQ=60,60;MPOS=16;POPAF=7.30;TLOD=4.20  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,0:0,1:0,0,1,0
chr18   10511383    .   A   G   .   .   DP=1;ECNT=1;MBQ=0,45;MFRL=0,160;MMQ=60,60;MPOS=22;POPAF=7.30;TLOD=4.20  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,1:0,0:0,0,0,1
chr18   75229298    .   A   G   .   .   DP=1;ECNT=1;MBQ=0,90;MFRL=0,127;MMQ=60,60;MPOS=8;POPAF=7.30;TLOD=4.20   GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,1:0,0:0,0,0,1
chr18   77560878    .   AA  TT  .   .   DP=1;ECNT=1;MBQ=0,90;MFRL=0,100;MMQ=60,60;MPOS=29;POPAF=7.30;TLOD=4.20  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:0,1:0.667:1:0,1:0,0:0,0,0,1
Ben-Habermeyer commented 2 years ago

@Ben-Habermeyer We had a few PRs in late 2021 that may have fixed this. If it's still occurring in the latest GATK version I would like to take a look at it.

ok @davidbenjamin I got a chance to test with latest release 4.3.0.0 and the issue seems to be mostly resolved when running --alleles on our test samples. Additionally, FilterMutectCalls works on low DP variants.

For control samples, using the --alleles option results in an error due to the value of the stats callable.

Combination of this call:

chr18   77560878        .       AA      TT      .       .       AS_SB_TABLE=0,0|0,0;DP=1;ECNT=2;MBQ=0,90;MFRL=0,100;MMQ=60,60;MPOS=29;POPAF=7.30;TLOD=4.20      GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1:0,1:0.667:1:0,1:0,0:0,1:0|1:77560878_AA_TT:77560878:0,0,0,1

and the stats file containing:

callable  1.0

results in FilterMutectCalls exception

java.lang.IllegalArgumentException: logValues must be non-infinite and non-NAN
        at org.broadinstitute.hellbender.utils.NaturalLogUtils.logSumExp(NaturalLogUtils.java:84)
        at org.broadinstitute.hellbender.utils.NaturalLogUtils.normalizeLog(NaturalLogUtils.java:51)
        at org.broadinstitute.hellbender.utils.NaturalLogUtils.normalizeFromLogToLinearSpace(NaturalLogUtils.java:27)
        at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.posteriorProbabilityOfError(Mutect2FilteringEngine.java:93)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.probabilityOfSequencingError(SomaticClusteringModel.java:140)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.probabilityOfSomaticVariant(SomaticClusteringModel.java:146)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.performEMIteration(SomaticClusteringModel.java:345)
        at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.learnAndClearAccumulatedData(SomaticClusteringModel.java:330)
        at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.learnParameters(Mutect2FilteringEngine.java:153)
        at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls.afterNthPass(FilterMutectCalls.java:165)
        at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:44)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Updating the stats file to a non 1.0, 1.1 value allows the filtering to proceed.