broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

Variant lost when interval expanded. #8021

Open wangshun1121 opened 2 years ago

wangshun1121 commented 2 years ago

GATK version: gatk-4.2.3

I have two bam files: Tumor.bam is cfDNA data, and Normal.bam were reads from white blood cells.

There's a variant C to G at chr7:116795782 (hg38), I can get this variant using following command:

gatk --java-options -Xmx4000m Mutect2  \
  -R /Homo_sapiens_assembly38.fasta \
  -I Tumor.bam \
  -I Normal.bam \
  -normal GA03009QX \
  -L chr7:116795653-116795791 \
  --interval-padding 100 \
  -O output.vcf

and I can get the variant I want in output.vcf

##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
...
##filtering_status=Warning: unfiltered Mutect 2 calls.  Please run FilterMutectCalls to remove false positives.
##normal_sample=GA03009QX
##source=Mutect2
##tumor_sample=GA03009CF
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  GA03009CF   GA03009QX
chr7    116795782   .   C   G   .   .   AS_SB_TABLE=1157,1223|8,9;DP=2519;ECNT=1;MBQ=20,20;MFRL=169,173;MMQ=60,60;MPOS=30;NALOD=1.23;NLOD=4.82;POPAF=6.00;TLOD=20.23    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:2357,17:8.145e-03:2374:739,8:746,4:1570,12:1145,1212,8,90/0:23,0:0.056:23:5,0:11,0:16,0:12,11,0,0

Then, I add 4 exon regions next to the exon where my variant is in interval_list.bed:

chr7    116781986   116782097
chr7    116783302   116783469
chr7    116795653   116795791
chr7    116795885   116796124
chr7    129189150   129189482

and I use similar GATK command line:

gatk --java-options -Xmx4000m Mutect2  \
  -R /Homo_sapiens_assembly38.fasta \
  -I Tumor.bam \
  -I Normal.bam \
  -normal GA03009QX \
  -L interval_list.bed \
  --interval-padding 100 \
  -O output.vcf

and the variant I want at chr7:116795782 LOST:

##filtering_status=Warning: unfiltered Mutect 2 calls.  Please run FilterMutectCalls to remove false positives.
##normal_sample=GA03009QX
##source=Mutect2
##tumor_sample=GA03009CF
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  GA03009CF   GA03009QX
chr7    116781913   .   T   C   .   .   AS_SB_TABLE=0,0|368,162;DP=540;ECNT=1;MBQ=0,20;MFRL=0,179;MMQ=60,60;MPOS=21;NALOD=-1.163e+01;NLOD=-1.133e+01;POPAF=6.00;TLOD=1695.59    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:0,527:0.997:527:0,190:0,194:0,384:0,0,365,162   0/0:0,3:0.800:3:0,2:0,1:0,3:0,0,3,0

Two bam files are in my following bams.zip: bams.zip

davidbenjamin commented 1 year ago

@wangshun1121 The allele fraction here is low enough that I'm not too surprised that a change in intervals affects it. However, the TLOD is high enough that I wish it were called regardless of intervals. I will take a look at your bams if I get the time to do so but most of my efforts are on the next version of Mutect and unfortunately that does mean some neglect of Mutect2.