broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

No .stat file produce by Mutect2 #6271

Open jupollet opened 4 years ago

jupollet commented 4 years ago

Bug Report

Affected tool(s) or class(es)

Affected version(s)

Description

vcf is produce but not .stat

Steps to reproduce

On cluster with SBTACH options (--constraint avx2 , 150Go de ram , 6 CPU):

parallel -k --plus 'gatk Mutect2 -R /omaha-beach/jpollet/MYD88/data/ref/BALBcJ.fasta -I {} \ -O {..}unf.vcf' \ ::: *.md.bam

-> same issue with for loops .

fleharty commented 4 years ago

@jupollet I'm unable to reproduce this bug. Could you provide more details?

jupollet commented 4 years ago

What kind of other information? I lunch this command in the GenOuest Cluster inside a bioconda environment containing Version:4.1.4.0 of GATK and other tools. Bam were generated by bwa mem and sorted by samtools, read group were added with gatk AddOrReplaceReadGroups and duplicates remove by gatk MarkDuplicates, indexing new bam.

SebastianHollizeck commented 4 years ago

It might be something that has been discussed in this thread https://gatkforums.broadinstitute.org/gatk/discussion/24595/mutect2-in-gatk-4-1-4-not-producing-stats-file

jupollet commented 4 years ago

They discuss that add more ram. I try with 8 cpu and 500 Go of RAM, but still not working.

Error for one bam file:


15:47:36.554 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/share/gatk4-4.1.4.0-1/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 28, 2019 3:47:37 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:47:37.239 INFO  Mutect2 - ------------------------------------------------------------
15:47:37.240 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.4.0
15:47:37.240 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
15:47:37.240 INFO  Mutect2 - Executing as jpollet@cl1n031.genouest.org on Linux v3.10.0-693.21.1.el7.x86_64 amd64
15:47:37.240 INFO  Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
15:47:37.246 INFO  Mutect2 - Start Date/Time: 28 novembre 2019 15:47:36 CET
15:47:37.246 INFO  Mutect2 - ------------------------------------------------------------
15:47:37.246 INFO  Mutect2 - ------------------------------------------------------------
15:47:37.246 INFO  Mutect2 - HTSJDK Version: 2.20.3
15:47:37.246 INFO  Mutect2 - Picard Version: 2.21.1
15:47:37.247 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:47:37.247 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:47:37.247 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:47:37.247 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:47:37.247 INFO  Mutect2 - Deflater: IntelDeflater
15:47:37.247 INFO  Mutect2 - Inflater: IntelInflater
15:47:37.247 INFO  Mutect2 - GCS max retries/reopens: 20
15:47:37.247 INFO  Mutect2 - Requester pays: disabled
15:47:37.247 INFO  Mutect2 - Initializing engine
15:47:41.204 INFO  Mutect2 - Done initializing engine
15:47:42.352 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/share/gatk4-4.1.4.0-1/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
15:47:42.423 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/share/gatk4-4.1.4.0-1/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
15:47:42.482 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
15:47:42.483 INFO  IntelPairHmm - Available threads: 8
15:47:42.483 INFO  IntelPairHmm - Requested threads: 4
15:47:42.483 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
15:47:42.936 INFO  ProgressMeter - Starting traversal
15:47:42.936 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
15:47:53.565 INFO  ProgressMeter - ENA|LVXK01000001|LVXK01000001.1:19555              0.2                    90            508.0
15:48:05.962 INFO  ProgressMeter - ENA|LVXK01000001|LVXK01000001.1:136820              0.4                   600           1563.5
15:48:16.023 INFO  ProgressMeter - ENA|LVXK01000001|LVXK01000001.1:360783              0.6                  1560           2828.9
15:48:19.342 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.010346494000000001
15:48:19.342 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 6.453042841
15:48:19.347 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 10.39 sec
15:48:19.348 INFO  Mutect2 - Shutting down engine
[28 novembre 2019 15:48:19 CET] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.72 minutes.
Runtime.totalMemory()=3822583808
java.lang.IllegalArgumentException: Cannot construct fragment from more than two reads
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:725)
    at org.broadinstitute.hellbender.utils.read.Fragment.create(Fragment.java:36)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.broadinstitute.hellbender.utils.genotyper.AlleleLikelihoods.groupEvidence(AlleleLikelihoods.java:595)
    at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:93)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:251)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:320)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:80: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(args) is 0 or (len(args) is 1 and (args[0] == "--help" or args[0] == "-h")):
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:80: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(args) is 0 or (len(args) is 1 and (args[0] == "--help" or args[0] == "-h")):
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:117: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(args) is 1 and args[0] == "--list":
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:308: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if call(["gsutil", "-q", "stat", gcsjar]) is 0:
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:312: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if call(["gsutil", "cp", jar, gcsjar]) is 0:
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:467: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if not len(properties) is 0:
/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/bin/gatk:471: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if not len(filesToAdd) is 0:
Using GATK jar /home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/share/gatk4-4.1.4.0-1/gatk-package-4.1.4.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/share/gatk4-4.1.4.0-1/gatk-package-4.1.4.0-local.jar Mutect2 -R /omaha-beach/jpollet/MYD88/data/ref/BALBcJ.fasta -I /omaha-beach/jpollet/MYD88/result/valide_3060_R1vsBALBcJ.sorted.md.bam -O /omaha-beach/jpollet/MYD88/result/valide_3060_R1vsBALBcJ.sortedunf.vcf
15:47:36.551 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/genouest/uni_limoges_fr/jpollet/.conda/envs/myd88/share/gatk4-4.1.4.0-1/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
jupollet commented 4 years ago

I verify with sacct SLURM command and the job have no problem with RAM memory, he run through the end but no produce .stat file and output only .vcf and .vcf.idx

fleharty commented 4 years ago

Jupollet,

This is a known issue and should be resolved by the most recent release gatk4-4.1.4.1. This was released last week, so you may need to just update.

If that doesn’t work, you may need to disable supplementary reads.

Thanks,

Mark

On Mon, Dec 2, 2019 at 10:52 AM jupollet notifications@github.com wrote:

I verify with sacct SLURM command and the job have no problem with RAM memory, he run through the end but no produce .stat file and output only .vcf and .vcf.idx

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/6271?email_source=notifications&email_token=ACRX2DIR7ZYRDCNPZNOLET3QWU4L3A5CNFSM4JPWZLUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFUEKKA#issuecomment-560481576, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRX2DJECNGRYUGBY62LLN3QWU4L3ANCNFSM4JPWZLUA .

jupollet commented 4 years ago

Hi, I update GATK today. After 158 minutes variant calling on the same bam files, I have another issue :

[3 décembre 2019 13:57:42 CET] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 158.34 minutes.
Runtime.totalMemory()=28647096320
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.LinkedHashMap$LinkedKeySet.iterator(LinkedHashMap.java:543)
    at java.util.HashSet.iterator(HashSet.java:173)
    at java.util.AbstractCollection.toArray(AbstractCollection.java:137)
    at java.util.LinkedList.addAll(LinkedList.java:408)
    at java.util.LinkedList.addAll(LinkedList.java:387)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph$BaseGraphIterator.next(BaseGraph.java:774)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph$BaseGraphIterator.next(BaseGraph.java:723)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:505)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:514)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:492)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:401)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:148)
    at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:290)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:224)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:320)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)
davidbenjamin commented 4 years ago

@jupollet What is the read depth like for this bam? Is it human tumor-only calling?

OlenaMaiakovska commented 2 years ago

Jupollet, This is a known issue and should be resolved by the most recent release gatk4-4.1.4.1. This was released last week, so you may need to just update. If that doesn’t work, you may need to disable supplementary reads. Thanks, Mark On Mon, Dec 2, 2019 at 10:52 AM jupollet @.***> wrote: I verify with sacct SLURM command and the job have no problem with RAM memory, he run through the end but no produce .stat file and output only .vcf and .vcf.idx — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#6271?email_source=notifications&email_token=ACRX2DIR7ZYRDCNPZNOLET3QWU4L3A5CNFSM4JPWZLUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFUEKKA#issuecomment-560481576>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRX2DJECNGRYUGBY62LLN3QWU4L3ANCNFSM4JPWZLUA .

Dear @fleharty

I have 4.2.2. version of GATK, however the problem is exactly like that, I get vcf and its index file without stats:


A USER ERROR has occurred: Mutect stats table somatic_449_WT_vs_6KO_Pd.vcf.gz.stats not found. When Mutect2 outputs a file calls.vcf it also creates a calls.vcf.stats file. Perhaps this file was not moved along with the vcf, or perhaps it was not delocalized from a virtual machine while running in the cloud.

riasc commented 1 year ago

Is there any update on that? Got the same issue on 4.4.0.0

ryanyord commented 1 year ago

@riasc I got the same issue on 4.4.0.0 and found that it was due to running via slurm.

Not sure why, but when running from the headnode it works, but running via sbatch only the .vcf and .vcf.idx are created. In the error message it reads:

A USER ERROR has occurred: Mutect stats table calls.vcf.stats not found. When Mutect2 outputs a file calls.vcf it also creates a calls.vcf.stats file. Perhaps this file was not moved along with the vcf, or perhaps it was not delocalized from a virtual machine while running in the cloud.

which sounds related.

ryanyord commented 1 year ago

Resolved this by adding --tmp-dir argument to command.

https://gatk.broadinstitute.org/hc/en-us/community/posts/16559299486619-Mutect2-No-stats-file-created-with-SLURM?page=1#community_comment_16677977653019