Open dmacmillan opened 3 years ago
Sounds like maybe you're using that function for something different than it was designed for. Are you passing in sample data?
As far as I remember, that was intended to handle primer sequences that contained mixtures, so it would look for all the possible combinations. Are you searching for a primer sequence that contains dashes?
Yes, it ties into ProbeFinder
which I use to locate primers within sample sequences. One of the culprit sequences I found contained many dash characters.
Is that culprit sequence a primer?
Yes and no, I try to align the ends of the samples to the ends of HXB2 which contain primers but are not uniquely primers as they have extra sequence on the ends in case the samples only align a small segment within the primer regions
OK, so you want to look at the ends of the sample sequence to see if they match anything in HXB2 relatively close to the expected primer regions? Here are some ideas:
genome_coverage.csv
file to tell you where the end of the contig aligns in HXB2. In v7.13, that is based on minimap2
.
https://github.com/cfe-lab/MiCall/blob/f79fca0d135dd08d22c4324f9e81bc819db813ce/micall/utils/probe_finder.py#L127
I traced a bug to this function where if the input argument
seq
contains too many ambiguous characters (ones that are resolved to many, i.e. '-' -> 'ACTG') then the function will hang because it attempts to resolve every possible sequence (which can be a very large number).To correct this I count all mixture characters first and instantiate a
count=1
. For each mixture I multiply the count by the number of potential resolution characters. I.e. "A-A-" has 2 mixture characters, '-' resolves to A,C,T, or G so it can resolve to 4 potential sequences. Since I see two '-' characters I multiply thecount
by 4 twice. I then create a threshold (i.e. 1000) such that if thecount
(which equals the number of potential sequences) is greater than that number, this function returns with nothing.Let me know if this is a viable solution or if you have a different or better proposal.