broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

PairHMM related bug #5543

Closed kachulis closed 5 years ago

kachulis commented 5 years ago

Coming from https://gatkforums.broadinstitute.org/gatk/discussion/9358/gatk-runtime-error-read-max-length-must-be-0-but-got-0-with-1000g-bam#latest. There seems to be a bug somewhere in the implementation of pair hmm, which multiple users have run into. The most recent user reported running Mutect2 on two different machines with the same inputs, and same versions of GATK. One run was successful, while the other failed with

java.lang.IllegalArgumentException: readMaxLength must be > 0 but got 0
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:730)
    at org.broadinstitute.hellbender.utils.pairhmm.PairHMM.initialize(PairHMM.java:152)
    at org.broadinstitute.hellbender.utils.pairhmm.N2MemoryPairHMM.initialize(N2MemoryPairHMM.java:28)
    at org.broadinstitute.hellbender.utils.pairhmm.LoglessPairHMM.initialize(LoglessPairHMM.java:7)
    at org.broadinstitute.hellbender.utils.pairhmm.PairHMM.initialize(PairHMM.java:177)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.initializePairHMM(PairHMMLikelihoodCalculationEngine.java:242)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngine.java:177)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:207)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:212)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:291)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:979)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:201)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289) 

after an earlier warning

10:31:03.566 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
10:31:03.566 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!

It seems like there is some sort of bug which is leading to pairhmm.initialize() being called with readMaxLength=0 at https://github.com/broadinstitute/gatk/blob/95155e886caabf0ea4880ff255388dea33878cfa/src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java#L242

when VectorLoglessPairHmm is being used this doesn't cause any issue, because initialize() is overridden and the parameter readMaxLength is ignored. However, when LoglessPairHMM is used, PairHmm.initialize() is eventually called with readMaxLength=0, which leads to the stack trace above.

af8 commented 5 years ago

Hi, are there any available updates on this bug ? I have run into it many times with gatk-4.1.0.0. Thank you!

lbergelson commented 5 years ago

@davidbenjamin Could you take a look at this? @TedBrookings thinks it might be as simple as changing the check to allow 0 length reads when initializing the pairHmm. Neither of us are sure that that's a great solution though.

It seems like if you have no read bases you can't do any useful calculation. Should there be an earlier check in mutect that avoids assembling a region if there aren't any reads with bases?

JJBio commented 5 years ago

Hi, thanks for looking into this! Could you also fix this for the GATK 3.8 version if possible? I've run into this many times.

davidbenjamin commented 5 years ago

@lbergelson I can look at this.

davidbenjamin commented 5 years ago

@TedBrookings is correct that we can simply waive the check. We could check for an empty region upstream in Mutect2, but we would also have to do it for HaplotypeCaller, which would be annoying. There's basically no cost to running PairHMM on an empty set of reads once we have already assembled, so I don't see a good reason to treat this as a special case.

TedBrookings commented 5 years ago

@JJBio Unfortunately GATK 3 is end-of-life and will not have any more releases/bug-fixes.

davidbenjamin commented 5 years ago

Also @JJBio Mutect2 is not like HaplotypeCaller, which is not too changed from the 3.8 version. It's a completely different tool -- new genotyping model, new assembly algorithms, new filters etc.

af8 commented 5 years ago

Just for information, when do you think a new release with this corrected bug will be available (possibly also in bioconda) ?

ghost commented 5 years ago

mutect2 stopped at chromosome 1

$ java -jar -Xmx12g gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar Mutect2 -R Homo_sapiens_assembly19.fasta -I Specimen_SNCR.10_1.bam -tumor Specimen_10_1 -O mutect2/10_1.vcf& [1] 40657 $ 09:49:03.454 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/Tools/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 09:49:05.899 INFO Mutect2 - ------------------------------------------------------------ 09:49:05.899 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.0.0 09:49:05.900 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/

09:49:05.900 INFO Mutect2 - Java runtime: IBM J9 VM v8.0.5.25 - pxa6480sr5fp25-20181030_01(SR5 FP25) 09:49:05.901 INFO Mutect2 - Start Date/Time: March 7, 2019 9:49:03 AM EST 09:49:05.901 INFO Mutect2 - ------------------------------------------------------------ 09:49:05.901 INFO Mutect2 - ------------------------------------------------------------ 09:49:05.901 INFO Mutect2 - HTSJDK Version: 2.18.2 09:49:05.901 INFO Mutect2 - Picard Version: 2.18.25 09:49:05.902 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2 09:49:05.902 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 09:49:05.902 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 09:49:05.902 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 09:49:05.902 INFO Mutect2 - Deflater: IntelDeflater 09:49:05.902 INFO Mutect2 - Inflater: IntelInflater 09:49:05.902 INFO Mutect2 - GCS max retries/reopens: 20 09:49:05.902 INFO Mutect2 - Requester pays: disabled 09:49:05.902 INFO Mutect2 - Initializing engine 09:49:06.887 INFO Mutect2 - Done initializing engine 09:49:06.935 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/Tools/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 09:49:06.937 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported 09:49:06.937 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation! 09:49:07.007 INFO ProgressMeter - Starting traversal 09:49:07.007 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute 09:49:17.023 INFO ProgressMeter - 1:139173 0.2 480 2875.7 09:49:27.704 INFO ProgressMeter - 1:763661 0.3 2590 7508.3 09:49:38.001 INFO ProgressMeter - 1:958723 0.5 3290 6369.0 09:49:49.182 INFO ProgressMeter - 1:981050 0.7 3380 4808.5 09:50:02.383 INFO ProgressMeter - 1:988991 0.9 3440 3727.3 09:50:13.586 INFO ProgressMeter - 1:1227096 1.1 4290 3866.1 09:50:23.594 INFO ProgressMeter - 1:1460850 1.3 5240 4105.1 09:50:34.165 INFO ProgressMeter - 1:1960541 1.5 7060 4860.1 09:50:46.537 INFO ProgressMeter - 1:2489135 1.7 8930 5383.3 09:50:56.541 INFO ProgressMeter - 1:3195743 1.8 11330 6206.3 09:51:06.566 INFO ProgressMeter - 1:6554142 2.0 22770 11427.0 09:51:21.997 INFO ProgressMeter - 1:8925092 2.2 30900 13734.4 09:51:32.919 INFO ProgressMeter - 1:8929959 2.4 30920 12714.5 09:51:44.170 INFO ProgressMeter - 1:10240032 2.6 35490 13549.0 09:51:54.380 INFO ProgressMeter - 1:11290109 2.8 39210 14056.0 09:52:04.380 INFO ProgressMeter - 1:12036002 3.0 41870 14163.4 09:52:15.321 INFO ProgressMeter - 1:14109134 3.1 48970 15602.7 09:52:25.709 INFO ProgressMeter - 1:16264047 3.3 56380 17024.5 09:52:36.145 INFO ProgressMeter - 1:16891523 3.5 58590 16809.0 09:52:50.980 INFO ProgressMeter - 1:16893022 3.7 58600 15698.3 09:53:01.463 INFO ProgressMeter - 1:17740440 3.9 61570 15756.5 09:53:12.294 INFO ProgressMeter - 1:19379390 4.1 67150 16425.7 09:53:22.294 INFO ProgressMeter - 1:20576686 4.3 71380 16776.4 09:53:32.681 INFO ProgressMeter - 1:21106727 4.4 73230 16538.3 09:53:44.258 INFO ProgressMeter - 1:21270052 4.6 73820 15975.4 09:53:54.757 INFO ProgressMeter - 1:21754504 4.8 75500 15742.8 09:54:04.928 INFO ProgressMeter - 1:23419224 5.0 81370 16387.6 09:54:15.956 INFO ProgressMeter - 1:23812728 5.1 82750 16070.6 09:54:31.008 INFO ProgressMeter - 1:24023237 5.4 83470 15457.4 09:54:33.610 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 58.921665822 09:54:33.611 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 72.92 sec 09:54:33.612 INFO Mutect2 - Shutting down engine [March 7, 2019 9:54:33 AM EST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 5.51 minutes. Runtime.totalMemory()=193003520 java.lang.IllegalArgumentException: readMaxLength must be > 0 but got 0 at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:730) at org.broadinstitute.hellbender.utils.pairhmm.PairHMM.initialize(PairHMM.java:152) at org.broadinstitute.hellbender.utils.pairhmm.N2MemoryPairHMM.initialize(N2MemoryPairHMM.java:28) at org.broadinstitute.hellbender.utils.pairhmm.LoglessPairHMM.initialize(LoglessPairHMM.java:7) at org.broadinstitute.hellbender.utils.pairhmm.PairHMM.initialize(PairHMM.java:177) at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.initializePairHMM(PairHMMLikelihoodCalculationEngine.java:242) at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngine.java:177) at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:229) at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:232) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:291) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291)

davidbenjamin commented 5 years ago

@af8 We are tentatively planning to release 4.1.1.0 with this bug fix next week.

davidbenjamin commented 5 years ago

@rong923 it looks like you are encountering the same issue. You can build the the gatk master branch to get a fix now, or wait for the 4.1.1.0 release.

ghost commented 5 years ago

may I ask why my other samples didn't run into the issue but only one sample did?

davidbenjamin commented 5 years ago

It's an edge case that occurs only (I think) when all the reads supporting a variant end very near the boundary of the assembly region.

af8 commented 5 years ago

Thanks a lot @davidbenjamin In the meantime I have compiled the master branch when I saw this issue was resolved and it worked fine.

I tried also to create a pon with this fresh compiled version but I got some errors (don't remember exactly what right now). Looks like you are in the middle of changing the pipeline of pon creation by integrating GenomicsDB as an intermediate, right ? Do you think it will also be ready for this next release ? I would like to create the pon with the same GATK version. Problem is I can not fall back on an earlier version because I would definitely get the bug we are talking about in this thread :-)

Perhaps we should also change our computing nodes at some point I guess :)

Thanks again

davidbenjamin commented 5 years ago

@af8 The next release will also have a WDL workflow for creating the pon in the new way.

ghost commented 5 years ago

wondering if this issue is resolved yet in the current version?

af8 commented 5 years ago

Yes @rong923, go with >= 4.1.1.0 and you are safe. I ve been running hundreds of pipelines without a problem now.