bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
189 stars 53 forks source link

how to obtain all variants where the "FILTER" column is not labeled as "PASS" #128

Closed xiechangxiao closed 7 months ago

xiechangxiao commented 7 months ago

Hi @litaifang,

Thank you for inventing such a useful tool. While studying this software in the past few days, I discovered that it only processes variants in VCF files where the "FILTER" column is labeled as "PASS." For example, when I used somaticseq Consensus mode to merge VCF files from multiple software(mutect2,varscan2,vardict), the resulting Consensus.sSNV.vcf file does not include all the variants and filters out many non-PASS variants. I would like to know the reason behind this filtering approach and how I can obtain results for all variants if desired.

litaifang commented 7 months ago

The VCF should include variants that are considered somatic mutations by at least one tool. VarDict and VarScan2's outputs, for instance, includes far more variant calls than what those tools consider to be "likely somatic mutations," such as germine variants or very low quality calls that have very low chance of being actual mutations.