Closed mathiasbio closed 9 months ago
On raw VCFs generated from vardict and TNscope with the most recent version of balsamic 12.0.2 I have run bcftools the same way we do in the TGA and WGS workflow.
For TGA:
singularity exec varcall_py3.sifbcftools filter --include 'INFO/AF < 1' --soft-filter 'balsamic_af_one' --mode '+' -o ${1}_balsamic_af_one.vcf $1
For WGS:
singularity exec varcall_py3.sif bcftools filter --include 'FORMAT/AF[0] < 1' --soft-filter 'balsamic_af_one' --mode '+' -o ${1}_balsamic_af_one.vcf $1
I'd also like to note that in the last NIQAS3 the clinically relevant variant was a SNV with roughly 0.996 AF, which means it was very close to being filtered out with this filter.
I'd also like to note that in the last NIQAS3 the clinically relevant variant was a SNV with roughly 0.996 AF, which means it was very close to being filtered out with this filter.
Thanks @mathiasbio for great summary. I agree in that this filter is something we really need to address. We should have a solid reasoning for why we include a filter.
Do you know anything about the NIQAS3 sample? For example, what was the tumour cell fraction in the sample? The reason I am asking is that I have really hard time to understand in what biological context a somatic variant could have VAF=1. It could of course be a measuring artefact (random sampling bias), but this should be very rare. A high VAF could also be explained by the presence of a CNA in the T sample for the region of the variant, but in this case the VAF should approach 1 (but never reach it).
@vwirta I don't remember exactly! But I remember something about overlapping CNVs in the region, I could ask Fulya. I also think it's very rare that a somatic variant would have an AF of 1, but Kalle brought my attention to this problem to begin with based on some ILC (https://github.com/Clinical-Genomics/External-comparison/issues/22 I think) where we missed some 100% VAF. Probably happens very rarely, so we don't need to get too anxious, but it would be nice to be sure : )
I've asked Fulya regarding the NIQAS3 case. Good point with the ILC. This was the one organised by Gustav Roussey, and I think it would be good to follow up with them as well regarding the missed variants. @AnnaLeinfelt Do you have the final report from the ILC? I can't find it in my email box.
I've asked Fulya regarding the NIQAS3 case. Good point with the ILC. This was the one organised by Gustav Roussey, and I think it would be good to follow up with them as well regarding the missed variants. @AnnaLeinfelt Do you have the final report from the ILC? I can't find it in my email box.
No, I don't. I was in contact earlier and then the results was about to be submitted. I'll contact them again. This slipped my mind.
Also wrote this in the PR: https://github.com/Clinical-Genomics/BALSAMIC/pull/1338
sequencing type | T/TN | case | # PASS variants release 13 | # PASS variants this PR | # additional PASS variants | % extra PASS variants |
---|---|---|---|---|---|---|
WGS | T | civilsole | 33051 | 33051 | 0 | 0 |
WGS | T | firstviper | 54072 | 54072 | 0 | 0 |
WGS | TN | fleetjay | 7189 | 7190 | 1 | 0,01% |
TGA | T | setamoeba | 2415 | 2416 | 1 | 0,04% |
TGA | TN | unitedbeagle | 1573 | 1573 | 0 | 0 |
TGA UMI | T (TNscope) | equalbug | 158 | 158 | 0 | 0 |
TGA UMI | T (VarDict) | equalbug | 70 | 70 | 0 | 0 |
TGA UMI | TN (TNscope) | uphippo | 124 | 124 | 0 | 0 |
TGA UMI | TN (VarDict) | uphippo | 105 | 105 | 0 | 0 |
Note that likely many variants would have been added in the T-only WGS cases if at the same time the T-only WGS specific filter bcftools filter --threads {threads} --include 'FORMAT/ALT_F1R2 > {params.strand_reads[0]} && (FORMAT/ALT_F1R2 > 0 && FORMAT/ALT_F2R1 > {params.strand_reads[0]} && FORMAT/REF_F1R2 > {params.strand_reads[0]} && FORMAT/REF_F2R1 > {params.strand_reads[0]})' --soft-filter '{params.strand_reads[1]}' --mode '+'
was also removed. But as this filter also requires that some reads support a reference variant, then removing the MAX_AF 1 filter in this analysis type has no real effect.
I think we can conclude with these stats that it is safe to remove the MAX AF filter, and we can postpone extending this fix to the WGS Tumor only analysis for later.
Also wrote this in the PR: #1338
sequencing type T/TN case # PASS variants release 13 # PASS variants this PR # additional PASS variants % extra PASS variants WGS T civilsole 33051 33051 0 0 WGS T firstviper 54072 54072 0 0 WGS TN fleetjay 7189 7190 1 0,01% TGA T setamoeba 2415 2416 1 0,04% TGA TN unitedbeagle 1573 1573 0 0 TGA UMI T (TNscope) equalbug 158 158 0 0 TGA UMI T (VarDict) equalbug 70 70 0 0 TGA UMI TN (TNscope) uphippo 124 124 0 0 TGA UMI TN (VarDict) uphippo 105 105 0 0 Note that likely many variants would have been added in the T-only WGS cases if at the same time the T-only WGS specific filter
bcftools filter --threads {threads} --include 'FORMAT/ALT_F1R2 > {params.strand_reads[0]} && (FORMAT/ALT_F1R2 > 0 && FORMAT/ALT_F2R1 > {params.strand_reads[0]} && FORMAT/REF_F1R2 > {params.strand_reads[0]} && FORMAT/REF_F2R1 > {params.strand_reads[0]})' --soft-filter '{params.strand_reads[1]}' --mode '+'
was also removed. But as this filter also requires that some reads support a reference variant, then removing the MAX_AF 1 filter in this analysis type has no real effect.I think we can conclude with these stats that it is safe to remove the MAX AF filter, and we can postpone extending this fix to the WGS Tumor only analysis for later.
Thanks @mathiasbio for a great summary again! I agree with your conclusion.
merged into https://github.com/Clinical-Genomics/BALSAMIC/pull/1320 🥳 closing issue!
Need
Background to this feature can be found here: https://github.com/Clinical-Genomics/BALSAMIC/issues/1166
In short the need is to remove this filter bcftools filter
--include FORMAT/AF[0] < 1 --soft-filter balsamic_af_one --mode +
which occurs in all workflows for filtering of SNVs and InDels, used here: in sentieon_quality_filter.ruleThis should be removed because it's a risk, especially in very pure and somewhat lower-coverage tumor-samples, that a true somatic variant may reach 1 AF and be filtered out.
Suggested approach
The suggested approach is simply to remove it from the rule. However, we want to make sure first that the filter is not terribly important for filtering out false positives so that we can if necessary make a more sophisticated solution.
Considered alternatives
Were there alternative approaches which have been rejected?
Requests/suggestions/bugs solved by the feature
Can be closed when
Link the issues needed to be closed for this to be implemented
Blockers
Anything preventing this from happening?