Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 16 forks source link

Investigate the viability of bcftools filter Maximum AF(tumor) < 1 #1166

Closed mathiasbio closed 9 months ago

mathiasbio commented 1 year ago

Is your feature request related to a problem? Please describe.

In an inter-laboratory comparison (https://github.com/Clinical-Genomics/External-comparison/issues/22) there were 2 clearly somatic SNVs that were not detected in balsamic, or rather they were detected, but subsequently filtered out.

The likely cause of them not being present in the final VCF is that they had an allele-frequency of 1, and we have this bcftools filter in all our workflows --include 'FORMAT/AF[0] < 1.0' --soft-filter balsamic_af_one --mode + which translates to: if the variant has an allele-frequency below 1 it will not have this filter assigned to it, but all variants with an allele frequency >= 1 will have this filter and consequently be filtered out.

This wasn't very nice in this comparison, and it would have been good if the variants had not been filtered out.

Describe the solution you'd like

I don't know the reason behind why this filter was implemented.

Either way, I don't think these variants are very common as the tumor is rarely this pure. But I can imagine, especially in WGS cases that this could occur by change in a very pure tumor sample too.

I think this filter needs to be investigated. And if this filter doesn't exist for a good reason (removing lots of false positive calls), it should probably be removed, and if the issue was that it's intended to remove variants with an illegal AF above 1, then it could maybe be corrected to something like: --exclude 'FORMAT/AF[0] > 1.0' --soft-filter balsamic_af_one --mode +

Describe alternatives you've considered If possible, a clear and concise description of any alternative solutions or features you've considered.

Additional context If possible, add any other context or screenshots about the feature request here.

Expected output for the feature If possible, an example of expected output

Current BALSAMIC version balsamic --version 12.0.0

mathiasbio commented 9 months ago

Closed https://github.com/Clinical-Genomics/BALSAMIC/pull/1338