Closed mathiasbio closed 1 year ago
It seems that this issue has been discussed on https://github.com/Ensembl/ensembl-vep/issues/600
and been fixed in the release >109 to still output the variants that are too long to annotate, instead of just filtering them away.
Perhaps then that a simple first step is just to update VEP and this issue could be solved, and hopefully the update doesn't cause any other issues.
+1 VEP update
Another option, perhaps better would be to simply update the --max_sv_size
to the size of chromosome 1 as they have done in MIP, to 248956422. Then we'd not only perserve the variants in the vcf, but also have them annotated as intended!
There must have been a reason why max_sv_size was included as an option in the first place, and I think I read it had something to do with memory requirements for longer SVs as mentioned here https://github.com/Ensembl/ensembl-vep/pull/463
.
It still seems like a good idea to increase the max_sv_size, if it works for MIP it probably works for us too. I think I will begin by testing if the max_sv_size solves the issue and doesn't cause memory issues.
A discussion needs to be had whether or not it is preferred to annotate these long variants or keep them unannotated. As far as I understand it:
If we want to solve this issue in https://github.com/Clinical-Genomics/BALSAMIC/issues/1119 it seems that the fastest solution is simply to annotate variants even though they are larger than 10MB. Updating VEP may be too much work to verify that everything is working as it should.
I have spoken to Chiara and Daniel Nilsson about uploading variants which have been annotated up to the size of Chrom 1, and while it doesn't seem very useful to have these annotations, they did not consider the annotation itself problematic.
I will test uploading the SV for a WGS T / N case with including annotations for even the above 10MB size variants.
After discussions it has been agreed to include this fix in the next release. The PR has been merged into develop, so I think I can close this issue now.
Describe the bug
Looking at WGS tumor / normal case "notedshark", SV calls are disappearing during the "vep_somatic_sv" rule. The number of variants in the vcf-files before and after is different:
vcf/SV.somatic.notedshark.svdb.research.vcf.gz
variants: 16152vep/SV.somatic.notedshark.svdb.research.vcf.gz
variants: 15844This is a difference of 308 variants, which exactly corresponds to the number of warnings in the vep_somatic_sv stderr output from balsamic.
grep WARNING BALSAMIC.notedshark.vep_somatic_sv.5.sh_4554097.err | wc -l
= 308And looking for an individual variant from the list of warnings, such as, "MantaDUP:TANDEM:335100:0:2:0:0:0:manta|DUP00079774:dellysv" you can find in the VCF before but not after this vep_somatic_sv rule.
I can only conclude that these variants when having this warning in VEP are in essence filtered out. This should not be the intended behaviour, and seems very urgent to fix!
To Reproduce Steps to reproduce the behavior.
Expected behavior A clear and concise description of what you expected to happen.
If workflow, which rules If possible, and using the Snakemake workflows, the name of the affected rules and workflows.
Screenshots
Version (please complete the following information):
balsamic --version
11.2.0Additional context Perhaps this fix can be included in the new release of balsamic! https://github.com/orgs/Clinical-Genomics/projects/46 Perhaps added as an extra PR in this issue: https://github.com/Clinical-Genomics/BALSAMIC/issues/1119