Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 16 forks source link

[Bug] VarDict produces SVs in the SNV file #1344

Open dnil opened 10 months ago

dnil commented 10 months ago

Describe the bug VarDict SVs once again end up in the SNV VCF files, making them hard to visualise for the user when loaded in Scout.

To Reproduce Load e.g. case novelbear (or presumably grep for DEL, DUP, BND etc in recent VarDict SNV VCFs). Or say

zgrep 25956141 /home/proj/production/housekeeper-bundles/novelbear/2023-12-08/SNV.somatic.novelbear.vardict.clinical.filtered.pass.vcf.gz |grep SVLEN |wc -l
1

Expected behavior Short nucleotide variants and structural variants should appear in different VCF files.

Screenshots Screenshot 2023-12-11 at 16 39 32

Screenshot 2023-12-11 at 16 40 29

Version (please complete the following information):

12.0.2

Additional context This has been an issue in the distant past, which I believe was not fully solved together with the VarDict devs, but worked around with filtering by Balsamic on the VarDict VCFs into separate files for display & delivery.

mathiasbio commented 10 months ago

Hmm I wonder if the fix was maybe only implemented for TNscope (https://github.com/Clinical-Genomics/BALSAMIC/pull/540/files) even though VarDict was mentioned in the original issue (https://github.com/Clinical-Genomics/BALSAMIC/issues/485), and that possibly since VarDict is only used for TGA it hasn't been as much of an issue as for TNscope which is used for WGS since there was less of a chance that SVs would be called in the smaller panel context. I took a little look at some of the VCFs produced by some TGA cases and WES, and it seems that the VarDict SVs are more common in the WES which makes sense since it's a larger panel.

I suppose this issue has been around for a while and is probably not super urgent, but I'll bring this up on a refinement session. Possibly we should disable the SV calling from VarDict, or separate it and add to the SVDB merge. But since I have no idea how good VarDict is at calling SVs I'm not ready to say that we should do that yet.

mathiasbio commented 9 months ago

Decision in refinement meeting 2024-01-12 to just remove the SV calls from VarDict

mathiasbio commented 6 months ago

An update to this issue: We will need to keep the SVs in VarDict for now, as this is how we are calling FLT3-ITD at the moment and the clinicians are looking for this variant in the SNV and InDel results. But it would be nice if we could work out a way to clean this up for the future...

mathiasbio commented 6 months ago

The above realisation occurred during testing of this PR https://github.com/Clinical-Genomics/BALSAMIC/pull/1414 which attempted to remove the SVs by adding the -U flag to the tumor only workflow as well (it had already been added to the tumor normal workflow).

dnil commented 6 months ago

Thank you for the attention to this - it's (occasionally) an annoying problem for the users, and it feels like it only awaits its first case of a misinterpreted causative variant, but so far I guess they manage.

Using VarDict for SV calling seems to make perfect sense, especially if it reproducibly finds the FLT3-ITD in contrast to others. But there should be no short read scenario where that variant ends up in the SNV file, with SNV annotation? Is there any reason why SVs produced by VarDict should not be split off into a SV VCF directly after calling, and treated as such for the remainder of the pipe?