Closed SAADAT-Abu closed 8 months ago
This looks like the MuTect2's VCF isn't as expected. Commonly due to either of the 2 reasons:
1) Did you use the Mutect2 in GATK4, or
2) Did the tumor and normal bam files have the same SM (i.e., same sample name)? MuTect2 requires that the tumor and normal have different SM in @RG
to distinguish tumor and normal reads.
Yes. I ran Mutect2 in GATK. Mutect2 Version="4.2.3.0". Format of my VCF is different from the one in the provided example.
My VCF:
##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=FAD,Number=R,Type=Integer,Description="Count of fragments supporting each allele.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --panel-of-normals /mnt/beegfs/asaadat/Sarek/iGenome/1000g_pon.hg38.vcf.gz --germline-resource /mnt/beegfs/asaadat/Sarek/iGenome/af-only-gnomad.hg38.vcf.gz --output /mnt/beegfs/asaadat/Mutect_calls/output/Job6//WT18.vcf --input:tumor /mnt/beegfs/asaadat/recalibrated/WT18/WT18.recal.cram:tumor --input:normal /mnt/beegfs/as
It does not have --tumor-name and --normal-name. Is that the problem?
Did you have paired normal bam file, or just a single tumor bam file?
If you have paired normal bam file, what are the tumor and normal headers that start with @RG
?
I am using paired normal/ tumor bam. Normal header: @RG ID:HGHLKDMXY.HK10.L001 PU:L001 SM:P10_HK10 LB:HK10 DS:/mnt/beegfs/asaadat/Sarek/iGenome/Homo_sapiens_assembly38.fasta PL:ILLUMINA
Tumor header: @RG ID:HGHLKDMXY.WT10.L001 PU:L001 SM:P10_WT10 LB:WT10 DS:/mnt/beegfs/asaadat/Sarek/iGenome/Homo_sapiens_assembly38.fasta PL:ILLUMINA
Thanks a lot.
I'll need to check if GATK 4.2.x.x has changed output VCF header......
Ok. Here is a VCF from my analysis. https://drive.google.com/drive/folders/1zztdjGAFW9y_CWLzTgxK82wUNjYq3tar?usp=sharing
Thanks. A.Saadat
You have two "tumor_sample" in your VCF header:
##tumor_sample=P6_HK6
##tumor_sample=P6_WT6
What was your Mutect2 command exactly?
No. HK6 is the healthy, WT6 is the tumor.
The code is here:
# Run MuTect2 using GATK
gatk Mutect2 \
-R "$reference_genome" \
-I:tumor "$tumor" \
-I:normal "$normal" \
-O "$output_vcf" \
--germline-resource "$germline_resource" \
--panel-of-normals "$panel_of_normals"
It was called with crams Here is the CSV from which samples were read. WT6 /mnt/beegfs/asaadat/recalibrated/WT6/WT6.recal.cram /mnt/beegfs/asaadat/recalibrated/HK6/HK6.recal.cram
I tried to run SomaticSeq with CRAMs too. but got the same error
Mutect2 is regarding both samples as tumor samples. The VCF header has both of the samples as ##tumor_sample
.
Try instead (not sure if --panel-of-normals
is compatible with the following though).
gatk Mutect2 \
-R "$reference_genome" \
-I HK6.recal.cram \
-I WT6.recal.cram \
-tumor P6_WT6 \ # Whatever after SM: in tumor bam
-normal P6_HK6 \ # Whatever after SM: in normal bam
-O "$output_vcf" \
--germline-resource "$germline_resource" \
--panel-of-normals "$panel_of_normals"
Followed your suggestion. Now my VCFs are looking ok. Thanks a lot.
Can someone please help me with this error: