Closed antunderwood closed 5 years ago
By the way, I added a few tests for my altered function that changed to has_two_high_quality_bases
to number_of_bases_above_threshold
😄
This looks great, the fraction contamination is something I've been meaning to get around to implementing for a while but hadn't yet. Thanks!
Hi Andrew
I've added a feature so that you can specify a cutoff based on the fraction of HQ bases not just absolute count. This is probably important where there is high depth of coverage and 2 ALT bases could be sequencing error. Using a fraction_cutoff of >=0.05 and count_cutoff of >=2, I eliminated a lot of SNVs I think were false positives based on manual inspection
In addition I have made a couple of changes so that it will account for
multiple SNVs in one contig https://github.com/lowandrew/ConFindr/blob/ab3bedbd70a959dd2498afc3c44f18c6a122c228/confindr/confindr.py#L598 https://github.com/lowandrew/ConFindr/blob/ab3bedbd70a959dd2498afc3c44f18c6a122c228/confindr/confindr.py#L761
allow for the fact that at some positions there may be three bases e.g {'A': 64, 'T': 2, 'C': 1} https://github.com/lowandrew/ConFindr/blob/ab3bedbd70a959dd2498afc3c44f18c6a122c228/confindr/confindr.py#L294
I have made extensive use of list and dictionary comprehensions in the find_if_multibase function to simplify the code a little.
I hope that the PR makes sense. Thanks for a great concept realised in this software