Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
492 stars 162 forks source link

Cigar results interpretation #203

Open luchaoqi opened 2 years ago

luchaoqi commented 2 years ago

Hello, thanks for this awesome and fast tool! I am new to edlib and would like to understand alignment results better as I didn't find a good example here

To Reproduce

import edlib
query = 'AAGGATTACT' # ligation barcode
target = 'AAGGATTACNT' # read sequence - I added a 'N' at the end before 'T'
edlib.align(query, target, mode='SHW', task='path')

Results

{'editDistance': 1, 'alphabetLength': 5, 'locations': [(0, 8), (0, 9), (0, 10)], 'cigar': '9=1I'}

Expected behavior

The above results show three end locations and I assume they are in deletion, mismatch, and insertion order. But the cigar results only show the case for insertion. In my real-world problem, the mismatches are having more chances than indels. So is there any plan to solve this problem or did I miss any information to achieve this?

Environment (please complete the following information):