Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
492 stars 162 forks source link

Sequence match (result.alignment) flags does not match the description in edlib.h #183

Closed ramana-athreya closed 3 years ago

ramana-athreya commented 3 years ago

I downloaded edlib files yesterday for use in my programs - the definitions of the sequence match flags in unsigned char *result.alignment seems to be inverted for insert/delete.

I have attached screenshots of edlib.h and the program run, which should be self-explanatory. The program output shows that result-alignment = 1 is insertion into query/deletion from target result-alignment = 2 is insertion into target/deletion from query edlib_output

This is the opposite of the definition in edlib.h edlib_includefile

This may not make any difference to the output of char* edlibAlignmentToCigar() because the code is internally consistent. It will require a change of the comments in edlib.h

... my apologies if I have not understood something

Martinsos commented 3 years ago

@ramana-athreya thanks for reporting this!

To make things simpler, let's talk only in term of insertions and ignore deletions.

When I look at the output of your program, I conclude that 1 is insertion into target, and 2 is insertion into query, which is aligned with what comments say. You conclude the opposite though -> could you explain your reasoning? I can try to explain mine: So in example above, I interpret operations like this:

  1. 3 -> mismatch of y onto t.
  2. 1 -> aelephant has a at start, but elephone doesn't. We insert a into target, and we get aelephone. So this is insertion into target.

What might be confusing you are the - characters. They don't represent deletions! They represent insertion. Well both, in a sense, since deletion in target is always equivalent to insertion in query and so on. But for example if we have query aba and target aa and following alignment:

aba
|.|
a-a

this means that we inserted b into target, or we could say we deleted b from the query. - represents a place where insertion happens. What is inserted? Whatever is at the same place in the other sequence.

Martinsos commented 3 years ago

I will close this one for now as I think it is a misunderstanding and not a bug, but feel free to continue the discussion and if it turns out otherwise I will reopen it.