The likely cause of them not being present in the final VCF is that they had an allele-frequency of 1, and we have this bcftools filter in all our workflows --include 'FORMAT/AF[0] < 1.0' --soft-filter balsamic_af_one --mode + which translates to: if the variant has an allele-frequency below 1 it will not have this filter assigned to it, but all variants with an allele frequency >= 1 will have this filter and consequently be filtered out.
This wasn't very nice in this comparison, and it would have been good if the variants had not been filtered out.
Describe the solution you'd like
I don't know the reason behind why this filter was implemented.
Maybe it was intended to filter out strange variant-calls with an allele frequency above 1 and was wrongly implemented.
Maybe there are common artifacts in the variant calling with an AF of 1.
Either way, I don't think these variants are very common as the tumor is rarely this pure. But I can imagine, especially in WGS cases that this could occur by change in a very pure tumor sample too.
I think this filter needs to be investigated. And if this filter doesn't exist for a good reason (removing lots of false positive calls), it should probably be removed, and if the issue was that it's intended to remove variants with an illegal AF above 1, then it could maybe be corrected to something like:
--exclude 'FORMAT/AF[0] > 1.0' --soft-filter balsamic_af_one --mode +
Describe alternatives you've considered
If possible, a clear and concise description of any alternative solutions or features you've considered.
Additional context
If possible, add any other context or screenshots about the feature request here.
Expected output for the feature
If possible, an example of expected output
Is your feature request related to a problem? Please describe.
In an inter-laboratory comparison (https://github.com/Clinical-Genomics/External-comparison/issues/22) there were 2 clearly somatic SNVs that were not detected in balsamic, or rather they were detected, but subsequently filtered out.
The likely cause of them not being present in the final VCF is that they had an allele-frequency of 1, and we have this bcftools filter in all our workflows
--include 'FORMAT/AF[0] < 1.0' --soft-filter balsamic_af_one --mode +
which translates to: if the variant has an allele-frequency below 1 it will not have this filter assigned to it, but all variants with an allele frequency >= 1 will have this filter and consequently be filtered out.This wasn't very nice in this comparison, and it would have been good if the variants had not been filtered out.
Describe the solution you'd like
I don't know the reason behind why this filter was implemented.
Either way, I don't think these variants are very common as the tumor is rarely this pure. But I can imagine, especially in WGS cases that this could occur by change in a very pure tumor sample too.
I think this filter needs to be investigated. And if this filter doesn't exist for a good reason (removing lots of false positive calls), it should probably be removed, and if the issue was that it's intended to remove variants with an illegal AF above 1, then it could maybe be corrected to something like:
--exclude 'FORMAT/AF[0] > 1.0' --soft-filter balsamic_af_one --mode +
Describe alternatives you've considered If possible, a clear and concise description of any alternative solutions or features you've considered.
Additional context If possible, add any other context or screenshots about the feature request here.
Expected output for the feature If possible, an example of expected output
Current BALSAMIC version
balsamic --version
12.0.0