Open rebber opened 10 months ago
Yeah MNV and complex variants are limitations because they just haven't been our focus. For VarDict, MNVs are parsed into multiple SNVs. I can do similar things for MuTect2 and other outputs, i.e., parse MNVs into multiple SNVs, and parse complex variants into SNVs and indels. If you can give me examples of complex variants and MNV's from those outputs, I can incorporate them.
Thanks for a quick reply! Primarily we want to keep any MNVs and complex variants together, in order to get proper annotation of them by VEP. We will therefore look into some other solution for variant merging from different callers
Hi,
We use somaticseq to just merge variants from Mutect2 and HMF Tools SAGE (the latter as "arbitrary" vcf's), the classification module is not used currently. However we were missing some multi-nucleotide variants (MNVs) in the somaticseq output, so I looked into the somaticseq code for how they are handled. I found that it seems any variants in input vcf's with both REF and ALT with length >1 base are ignored.
I see the following division into SNVs or indels, both in modify_ssMuTect2.py and splitVcf.py (for preparation of arbitrary vcf's):
And any other variants, i.e
len(vcf_i.refbase) > 1 and len(vcf_i.altbase) > 1
, will be skipped.Is it a correct observation that MNVs and complex variants are ignored? What was the reasoning behind setting it up like this? Is there any way to go around it?
We do not want to miss these types of variants, and have to look into other tools if we can't avoid this behaivour with somaticseq.
Best regards Rebecka