fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

Mutect2 and VarScan2 Filter Assessments #47

Closed Kcjohnson closed 6 years ago

Kcjohnson commented 6 years ago

In the GLASS-WG cohort there are several samples (TCGA GBM/LGG) that have been analyzed by other variant calling pipelines, including multiple callers in the Pan-Cancer Analysis of Whole Genomes (PCAWG) analysis. It would be helpful to benchmark the GLASS-WG Mutect2/VarScan2 calls against these extant data. Additionally, these analyses may also assist in deriving our own Mutect2/VarScan2 consensus calls for the entire GLASS-WG cohort.

SNVs from TCGA samples:

To make this analysis generalizable, we are going to put these commands through snakemake and R to generate reports for variant call overlaps:

Kcjohnson commented 6 years ago

The PCAWG dataset includes 10 samples that are also listed in the GLASS-WG dataset. The PCAWG vcf files (INDELs and SNV_MNVs are separately stored) have been generated by combining calls from at least 2 of the following callers: Broad, Sanger, DFKZ, and Muse. For the purposes of this project, we can consider these samples as a truth set with which to compare our in-house Mutect2 and VarScan2 results. We also intend to use these results to recalibrate the callers depending on SNV overlap.

While I still need to generate some precision/accuracy estimates for VarScan2 and Mutect2 with these data I was able to generate a few Venn Diagrams with the overlaps between Mutect2, VarScan2, and PCAWG. Overall, these results indicate that the callers are performing fairly well. We could relax the VarScan2 requirements to increase its sensitivity. TCGA-06-0190-R1-SNVMNV_Venn.pdf TCGA-14-1034-R2-SNVMNV_Venn.pdf

For the rest of the dataset that does not have PCAWG calls for comparisons we can examine the consensus calls between Mutect2 and VarScan2. We observed that for the low-pass cohorts (HF and MD-LP) there are very few SNVs called. For HK and TCGA, the major observations were that VarScan2 calls were more conservative. The final callsets for the GLASS-WG data will incorporate the consensus calls between Mutect2 and VarScan2. Thus, we can relax the criteria on VarScan2 in order to recovery more true positives.

hk-consensus-calls-zoomed tcga-consensus-calls mda-lp-consensus-calls

Kcjohnson commented 6 years ago

Update: Examples of filters applied across all cohorts and the number of variants (SNVs and indels) called by Mutect2: glass-wg-coverage-vs-m2calls

glass-wg-total-filters

We still need to determine how/whether to merge Mutect2 results with VarScan2.

fpbarthel commented 6 years ago

We need to perform a comparison of this callset to newer calls using a more restrictive cohort-wide PON

fpbarthel commented 6 years ago

Closing this issue for now, since 1) we concluded M2 filters seem to be appropriate and 2) we are going single-caller for the sake of time