OLC-Bioinformatics / ConFindr

Intra-species bacterial contamination detection
https://olc-bioinformatics.github.io/ConFindr/
MIT License
22 stars 8 forks source link

base fraction cutoff #5

Closed antunderwood closed 5 years ago

antunderwood commented 5 years ago

Hi Andrew

I've added a feature so that you can specify a cutoff based on the fraction of HQ bases not just absolute count. This is probably important where there is high depth of coverage and 2 ALT bases could be sequencing error. Using a fraction_cutoff of >=0.05 and count_cutoff of >=2, I eliminated a lot of SNVs I think were false positives based on manual inspection

In addition I have made a couple of changes so that it will account for

I have made extensive use of list and dictionary comprehensions in the find_if_multibase function to simplify the code a little.

I hope that the PR makes sense. Thanks for a great concept realised in this software

antunderwood commented 5 years ago

By the way, I added a few tests for my altered function that changed to has_two_high_quality_bases to number_of_bases_above_threshold 😄

lowandrew commented 5 years ago

This looks great, the fraction contamination is something I've been meaning to get around to implementing for a while but hadn't yet. Thanks!