Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
492 stars 162 forks source link

Possible to customize costs used for gap opening, gap extension, indel and substitutions? #205

Closed ayaanhossain closed 1 year ago

ayaanhossain commented 2 years ago

As you know, the NW algorithm can be customized with different costs for indels, substitutions, gap opening and gap extension. I understand everything is unit-cost in this algorithm, but is it possible to use a custom set of costs to produce alignments that are closer to some specific hypothesis?

iAvicenna commented 1 year ago

I was also looking for an answer to this question. Aligning a long DNA sequence against a version of its self with only substitutions can many lead to many indels instead (rightfully so with even better alignment scores most of the time but biologically non-reasonable). It would super useful if it was possible to implement indel weights in edlib

rob-p commented 1 year ago

I would doubt this would be supported here. Edlib is an edit distance based alignment library, not an arbitrary alignment scoring library. The fact that it computes edit distances allows substantial practical optimizations, which it's what makes this so fast. Many of these wouldn't be possible in the same way with general linear or affine scoring parameters. If you need those other scoring functions, perhaps give https://github.com/smarco/WFA2-lib a try.

Martinsos commented 1 year ago

Thanks @rob-p , that is very correct! Edlib gets its speed from the properties of edit distance, and there is no intention to support different scoring mechanisms -> there are other solutions that are better at that, edlib instead focuses on edit distance (notice the name -> Edit Distance LIBrary -> EDLIB!). If you want a wider array of alignment methods / scoring systems, check out https://www.seqan.de/ .