Closed ArtPoon closed 7 years ago
Note the above examples were run with default gap open penalty of 1
, which is very permissive of gaps.
Interesting: if I set g2.gap_open_penalty=5
then I get results for the full 100bp sequences:
>>> g2.gap_open_penalty = 5
>>> g2.align(s1[:90], s2[:90])
('TTTTTAGATGGGATAGATAAAGCTCAAGAAGAACATGA-AAGATATCACAGCAATTGGAGAGCAATGGCTAGTGATTTTAATCTGCCACCT', 'TTTTTGGATGGAATAGATAAGGCTCAAGAAGAACATGAGAA-ATATCACAACAATTGGAGAGCAATGGCTAGTGATTTTAACCTACCACCC', 370)
>>> g2.align(s1, s2)
('TTTTTAGATGGGATAGATAAAGCTCAAGAAGAACATGA-AAGATATCACAGCAATTGGAGAGCAATGGCTAGTGATTTTAATCTGCCACCTATAGTAGCAA', 'TTTTTGGATGGAATAGATAAGGCTCAAGAAGAACATGAGAA-ATATCACAACAATTGGAGAGCAATGGCTAGTGATTTTAACCTACCACCCGTGGTAGCAA', 402)
Wrote a unit test that reproduces this problem. Also printing cost matrix to stdout. When gap_open_penalty
is set to 3
, we get no error and this matrix (last 10 rows and columns only):
84 48 48 48 48 48 48 48 48 48 18
72 84 40 40 80 48 48 112 48 48 18
72 40 84 48 40 80 48 48 48 50 18
72 80 40 84 48 40 80 48 48 48 22
72 72 72 72 84 48 40 80 48 48 18
72 72 80 72 72 84 48 40 48 50 18
72 88 72 80 72 72 84 48 48 48 22
72 72 72 72 80 72 72 84 48 48 18
72 72 72 72 72 72 72 72 4 18 18
73 88 73 72 72 73 72 72 65 4 22
65 68 65 69 65 65 69 65 65 69 4
but if it is set to 2
then there is an error and this matrix:
84 48 48 48 48 48 48 48 48 48 18
72 124 40 40 80 48 48 112 48 48 18
72 40 84 48 40 80 48 48 48 50 18
72 80 40 84 48 40 80 48 48 48 22
72 72 104 72 84 48 40 80 48 48 18
72 72 80 104 72 84 48 40 56 58 26
72 88 72 80 104 72 84 48 48 48 22
72 72 72 72 80 104 72 84 48 48 18
72 72 72 72 72 72 72 72 4 18 18
73 88 73 72 72 73 72 72 65 4 22
65 68 65 69 65 65 69 65 65 69 4
Note is_global
is set to True
.
Edit: Whoops, these are bit matrices, not cost matrices
Got some more debugging output, traceback fails at position i=83
, j=82
.
Bit matrix from 79
to 89
:
79 | 80 | 81 | 82 | 83 | 84 | |
---|---|---|---|---|---|---|
79 | 84 | 48 | 48 | 48 | 48 | 48 |
80 | 72 | 84 | 48 | 48 | 48 | 48 |
81 | 72 | 72 | 124 | 40 | 40 | 40 |
82 | 72 | 72 | 40 | 47 | 40 | 40 |
83 | 72 | 72 | 80 | 48 | 54 | 48 |
84 | 72 | 72 | 72 | 120 | 40 | 84 |
Cost matrix, same slice:
79 | 80 | 81 | 82 | 83 | 84 | |
---|---|---|---|---|---|---|
79 | -339 | -336 | -335 | -334 | -333 | -332 |
80 | -336 | -344 | -341 | -340 | -339 | -338 |
81 | -335 | -341 | -340 | -337 | -336 | -335 |
82 | -334 | -340 | -337 | -336 | -335 | -334 |
83 | -333 | -339 | -345 | -342 | -341 | -340 |
84 | -332 | -338 | -342 | -341 | -338 | -346 |
48
is bit-string 110000
, which corresponds to setting flags for e
and f
.
To reproduce:
We can align slightly shorter sequences:
but we can't do all 100: