TORCH-Consortium / MAGMA

A pipeline for comprehensive genomic analyses of Mycobacterium tuberculosis with a focus on clinical decision making as well as research
https://doi.org/10.1371/journal.pcbi.1011648
GNU General Public License v3.0
13 stars 3 forks source link

Improve strand bias filter for minor/infrequent variants #202

Closed abhi18av closed 5 months ago

abhi18av commented 5 months ago

From @TimHHH initial message

A while ago we encountered a problem with what where probably false positive minor frequency variants, these where subsequently interpreted for DR. We identified that strand bias was indicative of the false positivity, but also that strand bias was not perfect; occasionally giving high values despite true positivity. On top of that the developer of LoFreq also indicates that this SB statistic is no goodhttps://sourceforge.net/p/lofreq/discussion/general/thread/27dd92cb/. Hence the quest for a better minor variant strand bias filter.

abhi18av commented 5 months ago

@TimHHH , please let us know here if this is looking good and we can go ahead with the merge :)

abhi18av commented 5 months ago

@TimHHH and @vrennie have given a go ahead for this PR. I'm adding the word of advice offered by Tim for reference

Now, the strand bias filter is working correctly, however you will see that a large number of minor variants are present in the output. Off course only so many of these are going to fall in DR relevant regions, but if you find that there are too many unreliable mutations passing through, other filters may have to be implemented. E.g. if you look at the variants with low QUAL scores in the VCF you will notice some suspicious mutations with very low coverage for example. So maybe have a look at some realistic data first and see if the minor variant data quantity is acceptable or ridiculous.

And the current decision

We will check these unreliable mutations if they get flagged by lab pathologists and ... extract data for a meta-analysis when we have a diverse enough dataset.