Closed rderelle closed 6 years ago
Hi @romain22, thank you for opening an issue, that is a completely reasonable feature request, however edlib is edit distance library and as such does not support Gotoh (gaps), which is what you described. Due to it's nature, it can not support them, as it would not be edit distance any more. There are other algorithms out there (and libraries) that offer support for that!
it is not strictly speaking an issue, more like a request for further improvements.
Using biological data (DNA,RNA or protein sequences), one usually consider that the deletion of n adjacent characters is the result of a unique evolutionary event (same apply for insertion).
The problem is that edlib will score n edits for the deletion/insertion of n adjacent characters. For instance, in this case edlib will return an editDistance of 3: TAGCGTAGCTAGCCTATTATCG TAGCGTAGCTA --- TATTATCG ... while the most parsimonious answer is 1 (i.e. 1 change, consisting of the insertion/deletion of GCC). nb: I believe this reasoning is correct for the comparison of any kind of string.
So, I was wondering if it would be possible to add on option to edlib.align() to score 1 edit for any insertion/deletion of n adjacent characters.
thanks.