broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

MuTect2 Visibly False Negative Variants #7648

Open DarioS opened 2 years ago

DarioS commented 2 years ago

I have a set of whole genome sequencing data which has sequencing of adjacent normals, primary tumours and cell cultures (CC) derived from the tumours. I looked at the intersection and difference between the variants identified in the tumour and in the CC of each person. About 80% of variants are identified in both tissue and CC but about 20% are identified only in one sample. An example:

image

The variant is PASS for the tissue sample with variant frequency 22% but entirely missing from the CC sample, with variant frequency 32% in the same position. Adjacent normals and CCs have approximately 30× coverage and tumours approximately 60×. Roughly an equal number of mutations are unique to each sample, so there are just as many mutations that the CC has which are missing from the tumour and from looking at five random discrepant locations, all appear to be genuinely present in both CC and tumour. Has Mutect2 been evaluated with technical replicates of a particular tumour sample or, like me, by cell cultures and primary tumours of the same cancer? I used the unversity core bioinformatics facility's implementation of the GATK best practices workflow for somatic short variant calling for a PBS Professional cluster computer. The VCFs are soft-filtered (i.e. most of the variants in them don't have PASS value in the FILTER column), so it's surprising the variant above is totally missing from CC's VCF file.

DarioS commented 2 years ago

Another view of same variant location but sorted. Location is hg38 chr7:142044831-142044925. Variant at 142044845. image

droazen commented 2 years ago

@davidbenjamin Tagging you on this one