Kennedy-Lab-UW / Duplex-Seq-Pipeline

A standalone end-to-end data analysis pipeline for Duplex Sequencing
Other
21 stars 9 forks source link

Variants in VCF aren't always used in calculating mutation frequencies #111

Closed scottrk closed 1 year ago

scottrk commented 1 year ago

I've noticed instances where a variant (typically an in/del) is present in the VCF file, but is not included in calculating the mutation frequencies when calculating mutation frequencies based on exon/blocks.

The overall sums are correct, but the counts for some of the specific genes are off. This only affects when using blocks. The gene specific and overall counts are correct when looking at the whole gene level. Setting of unique doesn't affect anything; it still gets the gene block count wrong, but the overall right. It's not at all obvious why some variants are being skipped. They are in the vcf and mut files and are well within a block/exon boundaries, so it's not as simple as falling between blocks. A prime example is for a deletion in FGFR3 in sample r87078. There are two -3 deletions, one with 42 supporting reads and the other with only a single supporting read. The gene level counts are correct and it counts both variants for a total of 43 (or 2 if invoking unique), but block level counts misses the variant with a single supporting read, resulting in 42 (or 1).

r87078_L200205_1_S200221.dcs.final.gene-burden-countmuts.csv r87078_L200205_1_S200221.dcs.final.exons-burden-countmuts.csv r87078_L200205_1_S200221.dcs.somatic.vcf.txt