Closed ayaanhossain closed 2 years ago
I was also looking for an answer to this question. Aligning a long DNA sequence against a version of its self with only substitutions can many lead to many indels instead (rightfully so with even better alignment scores most of the time but biologically non-reasonable). It would super useful if it was possible to implement indel weights in edlib
I would doubt this would be supported here. Edlib is an edit distance based alignment library, not an arbitrary alignment scoring library. The fact that it computes edit distances allows substantial practical optimizations, which it's what makes this so fast. Many of these wouldn't be possible in the same way with general linear or affine scoring parameters. If you need those other scoring functions, perhaps give https://github.com/smarco/WFA2-lib a try.
Thanks @rob-p , that is very correct! Edlib gets its speed from the properties of edit distance, and there is no intention to support different scoring mechanisms -> there are other solutions that are better at that, edlib instead focuses on edit distance (notice the name -> Edit Distance LIBrary -> EDLIB!). If you want a wider array of alignment methods / scoring systems, check out https://www.seqan.de/ .
As you know, the NW algorithm can be customized with different costs for indels, substitutions, gap opening and gap extension. I understand everything is unit-cost in this algorithm, but is it possible to use a custom set of costs to produce alignments that are closer to some specific hypothesis?