Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
512 stars 167 forks source link

Finding alignment #7

Closed Martinsos closed 10 years ago

Martinsos commented 10 years ago

Mogao bih dodati nalazenje alignmenta, po slicnoj ideji kao kod SSW. Za NW bi trebalo ponovno sve izracunat pa to bas i nema smisla, ali za HW i SHW bi trebalo izracunati samo dio matrice tako da bi se to moglo upotrijebiti. Pogotovo za HW ustvari. Dakle ideja je da se od pozicije gdje je kraj alignmenta krene prema nazad racunati, i tada se ocito racuna samo dio matrice.

Martinsos commented 10 years ago

Finding alignment is not so hard! I know for each block its vertical deltas for every cell and horizontal delta of last cell in block. This means that if I store vertical and horizontal deltas for all blocks, I can reconstruct path of alignment. For each cell on path, I will have to compute part of block it belongs to. I implemented this, it slows whole algorithm about 3, 4 times and takes a lot of memory.

Martinsos commented 10 years ago

Interesting: if query is larger then target, then optimal path will not go through so many blocks and finding alignment will be faster! In HW and SHW this does not affect us because we can not swap query and target, but in NW we can so this should be considered. On the other side, in NW query and target will usually be of similar length, so this does not really help

Martinsos commented 10 years ago

I should speed up alignment, and also make it consume much less memory. Options I have for that: Hirschbergs algorithm, or something similar to what they did in SSW. SSW idea could work pretty well for HW and SHW, since query is small there. For NW I do not believe it will be very useful, so this is where I could consider Hirschberg.

Martinsos commented 10 years ago

I could also just try to use less space, by doing something smart hm. But big problem is that I do not want to allocate memory too often. I could use something like vector, to allocate memory dinamically?

Martinsos commented 10 years ago

I implemented SSW idea of finding alignment. SHW and HW now use NW alignment to find alignment. NW alignment is still implemented like before. Next step is to make NW use less memory, or implement it with Hischberg. I did some tests and it seems like Hirschberg method can be much slower in some cases, so I will not try it so soon.

Martinsos commented 10 years ago

I should change finding of alignment for NW so it uses only space needed for band, not the whole matrix.

Martinsos commented 10 years ago

I will create new issue for this!