broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.65k stars 582 forks source link

Why Can't I filter my vcf using FilterMutectCalls? #6102

Closed wangshun1121 closed 4 years ago

wangshun1121 commented 4 years ago

I using following command to filter my vcf file:

gatk --java-options "-Xmx4g" FilterMutectCalls -O Filtered.vcf -V Try.vcf.gz

And failed with following info:

Using GATK jar /root/miniconda2/envs/aginome-rna/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /root/miniconda2/envs/aginome-rna/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar FilterMutectCalls -O Filtered.vcf -V Try.vcf.gz
09:54:21.606 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/root/miniconda2/envs/aginome-rna/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:54:28.491 INFO  FilterMutectCalls - ------------------------------------------------------------
09:54:28.492 INFO  FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.1.0.0
09:54:28.492 INFO  FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
09:54:28.492 INFO  FilterMutectCalls - Executing as root@e19ded81d0c5 on Linux v4.15.0-55-generic amd64
09:54:28.493 INFO  FilterMutectCalls - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
09:54:28.493 INFO  FilterMutectCalls - Start Date/Time: August 20, 2019 9:54:21 AM UTC
09:54:28.493 INFO  FilterMutectCalls - ------------------------------------------------------------
09:54:28.493 INFO  FilterMutectCalls - ------------------------------------------------------------
09:54:28.494 INFO  FilterMutectCalls - HTSJDK Version: 2.18.2
09:54:28.494 INFO  FilterMutectCalls - Picard Version: 2.18.25
09:54:28.494 INFO  FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:54:28.494 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:54:28.494 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:54:28.495 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:54:28.495 INFO  FilterMutectCalls - Deflater: IntelDeflater
09:54:28.495 INFO  FilterMutectCalls - Inflater: IntelInflater
09:54:28.495 INFO  FilterMutectCalls - GCS max retries/reopens: 20
09:54:28.495 INFO  FilterMutectCalls - Requester pays: disabled
09:54:28.495 INFO  FilterMutectCalls - Initializing engine
09:54:28.840 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/md0/DataProcess/Ranshi/Mutect2/Try.vcf.gz
09:54:28.923 INFO  FilterMutectCalls - Done initializing engine
09:54:28.975 INFO  ProgressMeter - Starting traversal
09:54:28.975 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
09:54:28.976 INFO  FilterMutectCalls - Starting first pass through the variants
09:54:29.139 INFO  FilterMutectCalls - Shutting down engine
[August 20, 2019 9:54:29 AM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls done. Elapsed time: 0.13 minutes.
Runtime.totalMemory()=1809317888
java.lang.ArrayIndexOutOfBoundsException: 2
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.applyContaminationFilter(Mutect2FilteringEngine.java:64)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.calculateFilters(Mutect2FilteringEngine.java:518)
    at org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls.firstPassApply(FilterMutectCalls.java:130)
    at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.lambda$traverseVariants$0(TwoPassVariantWalker.java:76)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverseVariants(TwoPassVariantWalker.java:74)
    at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverse(TwoPassVariantWalker.java:27)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

This is the vcf files. Try.vcf.gz

lbergelson commented 4 years ago

@wangshun1121 That definitely sounds like a bug. Is it possible for you to try running with the newest released version 4.1.3.0 in order to test if it's still present?

wangshun1121 commented 4 years ago

@lbergelson I am trying latest release, I use following command:

./gatk-4.1.3.0/gatk --java-options "-Xmx4g" FilterMutectCalls -O Filtered.vcf -V Try.vcf.gz -R ~/human.fa/ucsc.hg19.fasta

and got following Info:

Using GATK jar /mnt/md0/DataProcess/Ranshi/Mutect2/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /mnt/md0/DataProcess/Ranshi/Mutect2/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar FilterMutectCalls -O Filtered.vcf -V Try.vcf.gz -R /home/imp/human.fa/ucsc.hg19.fasta
09:44:27.763 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/md0/DataProcess/Ranshi/Mutect2/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 21, 2019 9:44:29 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:44:29.499 INFO  FilterMutectCalls - ------------------------------------------------------------
09:44:29.500 INFO  FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.1.3.0
09:44:29.500 INFO  FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
09:44:29.500 INFO  FilterMutectCalls - Executing as imp@imp-WorkStation on Linux v4.15.0-55-generic amd64
09:44:29.500 INFO  FilterMutectCalls - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10
09:44:29.501 INFO  FilterMutectCalls - Start Date/Time: 2019年8月21日 上午09时44分27秒
09:44:29.501 INFO  FilterMutectCalls - ------------------------------------------------------------
09:44:29.501 INFO  FilterMutectCalls - ------------------------------------------------------------
09:44:29.502 INFO  FilterMutectCalls - HTSJDK Version: 2.20.1
09:44:29.502 INFO  FilterMutectCalls - Picard Version: 2.20.5
09:44:29.502 INFO  FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:44:29.502 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:44:29.502 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:44:29.503 INFO  FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:44:29.503 INFO  FilterMutectCalls - Deflater: IntelDeflater
09:44:29.503 INFO  FilterMutectCalls - Inflater: IntelInflater
09:44:29.503 INFO  FilterMutectCalls - GCS max retries/reopens: 20
09:44:29.503 INFO  FilterMutectCalls - Requester pays: disabled
09:44:29.503 INFO  FilterMutectCalls - Initializing engine
09:44:29.869 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/md0/DataProcess/Ranshi/Mutect2/Try.vcf.gz
09:44:29.942 INFO  FilterMutectCalls - Done initializing engine
09:44:30.029 INFO  FilterMutectCalls - Shutting down engine
[2019年8月21日 上午09时44分30秒] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2097152000
***********************************************************************

A USER ERROR has occurred: Mutect stats table Try.vcf.gz.stats not found.  When Mutect2 outputs a file calls.vcf it also creates a calls.vcf.stats file.  Perhaps this file was not moved along with the vcf, or perhaps it was not delocalized from a virtual machine while running in the cloud.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Try.vcf.stat was lost, I have to re-run Mutect2 using latest release. It seems that this bug was fixed.

wangshun1121 commented 4 years ago

It's OK using gatk-4.1.3

lbergelson commented 4 years ago

@wangshun1121 Good to hear. I love bugs that have already been fixed :)

andrewrech commented 4 years ago

@wangshun1121 @lbergelson

Thanks for reporting this. I'm confused -- in the latest release, is the optional stats file now not required?

lbergelson commented 4 years ago

I think the optional stats are now no longer optional. @takutosato could say for sure though.

andrewrech commented 4 years ago

Thanks @lbergelson

Indeed on master the command fails, however the argument seems optional

I would imagine a fair number of users may not have this file saved (like myself). Would it be possible to maintain some backwards compatibility here?

wangshun1121 commented 4 years ago

@andrewrech it seems that some modules, such as FilterMutectCalls, CreateSomaticPanelOfNormals, etc, are changing significantly in recent gatk releases.

lbergelson commented 4 years ago

Ah, that looks like a bug. The argument should be required I think. I opened a ticket https://github.com/broadinstitute/gatk/issues/6124.

I think the whole Mutect2 / FilterMutectCalls chain of tools are pretty tightly coupled together and are evolving fairly quickly. So it's not a great idea to try to re-filter calls made with an earlier version of Mutect with a newer one.

wangshun1121 commented 4 years ago

In my pipeline, when I called somatic variants using Mutect2, I would mark variants using FilterMutectCalls immediately. I think that FilterMutectCalls should be merged into Mutect2. What's more, I ran Mutect2 by chromosome parallelly by myself, and finally I would merge variants on different chrs into one VCF file-That means that saving .stat file of final result is impossible!

andrewrech commented 4 years ago

Hello all,

Thank you.

My pipeline is as @wangshun1121 describes as well - I run regions in parallel.

I would prefer if this argument remained optional, as this will permanently break pipelines running Mutect2 on regions in parallel.

Does anyone know offhand what old version I can pin to as a work around?

@lbergelson

lbergelson commented 4 years ago

@andrewrech I'm not certain. Let's let @davidbenjamin or @takutosato weigh in before we get too far down this rabbit hole 🐇. I'm not super knowledgeable about the details of mutect and don't want to give any bad information.

takutosato commented 4 years ago

@andrewrech @wangshun1121 you can merge the stats from the parallel runs of Mutect using MergeStats as is done in the best practice wdl:

https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2.wdl#L331

davidbenjamin commented 4 years ago

@wangshun1121 We decoupled FilterMutectCalls from Mutect2 a while back in order to enable quick experimentation with different filtering arguments. Since then, a new reason has come up: FilterMutectCalls looks at data from the entire sample, and therefore merging it into Mutect2 is not compatible with scattering over different genomic regions. By the way, have you tried using the best practices WDL? Our goal is to spare users the nuisance of writing pipelines themselves.

wangshun1121 commented 4 years ago

@davidbenjamin I develop my pipelines with perl & Nextflow myself: it's necessary for me. And I would look up best practices WDL scripts when coding my own pipelines.

davidbenjamin commented 4 years ago

Thanks @wangshun1121; it's always helpful to understand how our tools are being used in the wild.

aditisk commented 4 years ago

It's OK using gatk-4.1.3

@lbergelson I am getting the same error as @wangshun1121 mentions above. I'm using version 4.1.4.0. I am assuming this version has the bug fix too like v4.1.3 that works for @wangshun1121 ?

Thanks.