Closed zhangbiwu closed 2 years ago
Did you run MuTect2 from GATK3?
A few people have raised this issue before. MuTect2 from GATK4 (the supported version) has headers like ##normal_sample=NORMAL_NAME
and ##tumor_sample=TUMOR_NAME
, but the ones from GATK3 does not.
If the error was due to GATK3, a simple fix is to simply add those two lines in the headers of the GATK3 MuTect2's VCF output.
If you were running GATK3's MuTect2, I would recommend upgrading to GATK4, which is much faster.
I run the dockerized somatic mutation callers and The following is the running code:
makeSomaticScripts.py \ paired \ --output-directory paired_example \ --tumor-bam tumor/sim7.snv.bam \ --normal-bam ../normal/sim15.normal.bam \ --genome-reference ../ref_fasta/hg19.fasta \ --truth-snv labels/sim7.snv.True_VCF \ --dbsnp-vcf ../dpsnp/dbsnp_150.hg19.vcf \ --run-mutect2 --run-vardict --run-strelka2 --run-somaticseq --train-somaticseq -nt 2 --run-workflow
thanks
Check the headers of tumor/sim7.snv.bam
and normal/sim15.normal.bam
, and see if they have different sample names, e.g., samtools view -H tumor/sim7.snv.bam
.
There should be a header line like @RG ID:ReadGroupID SM:SampleName LB:LibraryPrep PL:ILLUMINA
.
The "SampleName" must be different for the tumor and normal bam files. That is a requirement for MuTect2.
thanks, when I check the headers of tumor/sim7.snv.bam,I get the info: @RG ID:69a29d45-8b8e-487e-8709-f3b074a7e3a2 CN:BS SM:sim7.snv.sort LB:bamsurgeon PL:COMPLETE @PG PN:bamsurgeon ID:bamsurgeon
when I check the headers of normal/sim15.normal.bam,I get the info: @RG ID:7408b2ee-2e07-4094-ae13-efbed0ad1bf2 CN:BS SM:sim15.normal LB:bamsurgeon PL:COMPLETE @PG PN:bamsurgeon ID:bamsurgeon @PG ID:samtools PN:samtools PP:bamsurgeon VN:1.10 CL:samtools view -H ../normal/sim15.normal.bam
So, The "SampleName" is different.
Can you send me mutect2's vcf file? I'll take a look what's wrong.
thanks, I check the MuTect2.vcf file and find The info of the file is none.
In your bam file headers, are the @rg
small case or actually @RG
? Separated by tabs right? Make sure MuTect2 runs on the pair of bam files. Try executing a mutect2.*.cmd script in one of those logs directories and see what happens.
Yes, are the @RG. I executing a mutect2.*.cmd and get these info:
Start at 2021/12/01 16:39:10 08:39:16.849 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/libs/gatk-package-4.0.5.2-local.jar!/com/intel/gkl/native/libgkl_compression.so 08:39:17.031 INFO Mutect2 - ------------------------------------------------------------ 08:39:17.031 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.0.5.2 08:39:17.031 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/ 08:39:17.032 INFO Mutect2 - Executing as ?@4dececb90b2f on Linux v5.8.0-36-generic amd64 08:39:17.032 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11 08:39:17.032 INFO Mutect2 - Start Date/Time: December 1, 2021 8:39:16 AM UTC 08:39:17.032 INFO Mutect2 - ------------------------------------------------------------ 08:39:17.032 INFO Mutect2 - ------------------------------------------------------------ 08:39:17.033 INFO Mutect2 - HTSJDK Version: 2.16.0 08:39:17.033 INFO Mutect2 - Picard Version: 2.18.7 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 08:39:17.033 INFO Mutect2 - Deflater: IntelDeflater 08:39:17.033 INFO Mutect2 - Inflater: IntelInflater 08:39:17.033 INFO Mutect2 - GCS max retries/reopens: 20 08:39:17.033 INFO Mutect2 - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 08:39:17.033 INFO Mutect2 - Initializing engine 08:39:17.065 INFO Mutect2 - Shutting down engine [December 1, 2021 8:39:17 AM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=510132224
A USER ERROR has occurred: Fasta dict file file:///a61c4b2f178b41dc8bce9b7b7eff8f0d/hg19.dict for reference file:///a61c4b2f178b41dc8bce9b7b7eff8f0d/hg19.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
MuTect2 requires .dict file. Make it using picard: https://gatk.broadinstitute.org/hc/en-us/articles/360036729911-CreateSequenceDictionary-Picard-
The small case @rg
could also be an issue.
If that gives you problem, you can save the header as a text file using samtools view -H
, and then convert @rg
to @RG
, and then use samtools reheader
to change the headers.
thanks, I will try it
Hi, this problem has not been resolved. The following is the running information
Start at 2021/12/02 17:57:38
INFO 2021-12-02 09:57:41,189 SomaticSeq SomaticSeq Input Arguments: output_directory=/ee7008ee88274839a7a1fc588c1c7d3e/2/SomaticSeq, genome_reference=/b9c990ead1ec4234ade4263b3a88efcb/hg19.fasta, truth_snv=/b51f64272bed4e1986ec80a404c7f61f/sim7.snv.True_VCF, truth_indel=None, classifier_snv=None, classifier_indel=None, pass_threshold=0.5, lowqual_threshold=0.1, algorithm=xgboost, homozygous_threshold=0.85, heterozygous_threshold=0.01, minimum_mapping_quality=1, minimum_base_quality=5, minimum_num_callers=0.5, dbsnp_vcf=None, cosmic_vcf=None, inclusion_region=/349982c0d3fc45a78ab23eaf6721355f/2.bed, exclusion_region=None, threads=1, somaticseq_train=False, seed=0, tree_depth=12, iterations=None, features_excluded=[], keep_intermediates=False, tumor_bam_file=/262b1fea0f8545d2be2fb06a1e9d1631/sim7.snv.bam, normal_bam_file=/f82f61aca55f46ac8867c7e4514b7362/sim15.normal.bam, tumor_sample=TUMOR, normal_sample=NORMAL, mutect_vcf=None, indelocator_vcf=None, mutect2_vcf=/ee7008ee88274839a7a1fc588c1c7d3e/2/MuTect2.vcf, varscan_snv=None, varscan_indel=None, jsm_vcf=None, somaticsniper_vcf=None, vardict_vcf=/ee7008ee88274839a7a1fc588c1c7d3e/2/VarDict.vcf, muse_vcf=None, lofreq_snv=None, lofreq_indel=None, scalpel_vcf=None, strelka_snv=/ee7008ee88274839a7a1fc588c1c7d3e/2/Strelka/results/variants/somatic.snvs.vcf.gz, strelka_indel=/ee7008ee88274839a7a1fc588c1c7d3e/2/Strelka/results/variants/somatic.indels.vcf.gz, tnscope_vcf=None, platypus_vcf=None, which=paired
INFO 2021-12-02 09:57:41,189 SomaticSeq SomaticSeq Input Arguments: output_directory=/bd7a3b6e30bd4590aae11c6b80ed265e/1/SomaticSeq, genome_reference=/403b623c096145d0b0e494e3ecdc0a54/hg19.fasta, truth_snv=/d0e4ecdf8de14d84bb66be656d74044a/sim7.snv.True_VCF, truth_indel=None, classifier_snv=None, classifier_indel=None, pass_threshold=0.5, lowqual_threshold=0.1, algorithm=xgboost, homozygous_threshold=0.85, heterozygous_threshold=0.01, minimum_mapping_quality=1, minimum_base_quality=5, minimum_num_callers=0.5, dbsnp_vcf=None, cosmic_vcf=None, inclusion_region=/74e62e0f44844fdf97041bde7f29affc/1.bed, exclusion_region=None, threads=1, somaticseq_train=False, seed=0, tree_depth=12, iterations=None, features_excluded=[], keep_intermediates=False, tumor_bam_file=/bbe6b9729a8c4e53b903b5c610702ef8/sim7.snv.bam, normal_bam_file=/c3bd5c15a97e45aab3c4a226cff9b5ad/sim15.normal.bam, tumor_sample=TUMOR, normal_sample=NORMAL, mutect_vcf=None, indelocator_vcf=None, mutect2_vcf=/bd7a3b6e30bd4590aae11c6b80ed265e/1/MuTect2.vcf, varscan_snv=None, varscan_indel=None, jsm_vcf=None, somaticsniper_vcf=None, vardict_vcf=/bd7a3b6e30bd4590aae11c6b80ed265e/1/VarDict.vcf, muse_vcf=None, lofreq_snv=None, lofreq_indel=None, scalpel_vcf=None, strelka_snv=/bd7a3b6e30bd4590aae11c6b80ed265e/1/Strelka/results/variants/somatic.snvs.vcf.gz, strelka_indel=/bd7a3b6e30bd4590aae11c6b80ed265e/1/Strelka/results/variants/somatic.indels.vcf.gz, tnscope_vcf=None, platypus_vcf=None, which=paired
Error: Unable to open file /ee7008ee88274839a7a1fc588c1c7d3e/2/MuTect2.vcf. Exiting.
Error: Unable to open file /bd7a3b6e30bd4590aae11c6b80ed265e/1/MuTect2.vcf. Exiting.
Traceback (most recent call last):
File "/usr/local/bin/run_somaticseq.py", line 4, in
Please help me analyze this problem again,thank you very much.
Seems like MuTect2 still didn't run successfully. Try executing the mutect2.*.cmd again, and you may let me know what the error/log output from MuTect2.
Did MuTect2 run? Also make sure you have bedtools
on your path.
Closing this issue for now. Feel free to re-open it if you're still having issues.
hi,I get a issue:
/somaticseq/vcfModifier/modify_MuTect2.py", line 64, in convert normal_index = header.index(normal_name) - 9 UnboundLocalError: local variable 'normal_name' referenced before assignment
Can you help me deal with this problem, thanks