ArtPoon / gotoh2

Lightweight and customizable Python/C extension for pairwise alignment of genetic sequences using the Gotoh algorithm
GNU Affero General Public License v3.0
5 stars 2 forks source link

Refactor broke traceback #2

Closed ArtPoon closed 7 years ago

ArtPoon commented 7 years ago

Original motivation for issue #1.
I've substituted a C struct for the alignment matrices being passed around among the core functions, which tidies things up a fair amount. For the first unit test, the R matrix is computed correctly:

0 6 7 8 
6 -5 1 2 
7 1 -10 -4 
8 2 -4 -6 
9 3 -3 -9 

but the traceback is wrong:

alen i j type
0 4 3 Vertical
1 3 3 Diagonal
2 2 2 D
3 1 1 D
ArtPoon commented 7 years ago

P matrix:

2147483647 2147483647 2147483647 2147483647 
6 12 13 14 
7 1 7 8 
8 2 -4 2 
9 3 -3 0

Q matrix:

2147483647 6 7 8 
2147483647 12 1 2 
2147483647 13 7 -4 
2147483647 14 8 2 
2147483647 15 9 3 

These seem to be correct.

ArtPoon commented 7 years ago

bit matrix:

0 16 16 16 
64 84 50 18 
64 73 84 18 
64 73 73 20 
64 65 65 4 

This also matches what I'm getting from master branch, so the problem is not in matrix computation. Checking edge assignment next.

ArtPoon commented 7 years ago

bit matrix after edge assignment:

0 23 23 7 
64 84 50 18 
64 79 84 7 
64 79 73 20 
64 7 65 7 

Aha! This is totally different!

ArtPoon commented 7 years ago

I have a suspicion that this is an edge effect problem. The edge assignment procedure is iterating through the bits matrix starting from the lower right cell. It looks up cells to the right, down, and down-right (diagonal). Right off the bat, these cells are unassigned and will be read as random noise that can still be used in bitwise operations.

As an experiment, I started the calculation at the cell to the upper-left of the original starting cell and got this bit matrix after edge assignment:

0 16 16 16 
64 84 55 18 
64 79 84 18 
64 79 73 20 
64 65 65 4 

This is actually closer to what I have been getting in the master branch (note that since the master branch uses the same edge assignment code, it shouldn't be relied on either!). I need to work out what the bit matrices are supposed to look like manually :-/

ArtPoon commented 7 years ago

Bit matrix after edge assignment as of commit 950a81cbf63a322c30c26af6bf83c740b3b46671:

0 16 16 16 
64 4 66 50 
64 17 4 18 
64 25 1 20 
64 73 73 4 
ArtPoon commented 7 years ago

Binary conversion to paths:

none   e   e   e
   g   c  bg bcg
   g  ae   c  be
   g ade   a  ce
   g adg adg   c