Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Segmentation fault #86

Closed edwardanderson closed 6 years ago

edwardanderson commented 6 years ago

Hello,

I've encountered a segmentation fault. It happens when comparing a diacritic character (but not all) in combination with a punctuation character (seems to be any) against another string. It's reproducible.

Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux

>>> import edlib
>>> edlib.align('ä:', 'a:')

Segmentation fault (core dumped)
>>> edlib.align('ä', 'a')

{'editDistance': 0, 'alphabetLength': 1, 'cigar': None, 'locations': [(None, 0)]}
>>> edlib.align(':', '-')

{'editDistance': 1, 'alphabetLength': 2, 'cigar': None, 'locations': [(None, 0)]}

How can I help debug?

Martinsos commented 6 years ago

Hi @edwardanderson, thanks for reaching out and thank you for your patience! I am having a hard time reproducing this problem, here is my output:

Python 3.6.1 (default, Mar 27 2017, 00:27:06) 
[GCC 6.3.1 20170306] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import edlib
>>> edlib.align('ä:', 'a:')
{'editDistance': 2, 'alphabetLength': 4, 'locations': [(None, 1)], 'cigar': None}
>>> edlib.align('ä', 'a')
{'editDistance': 1, 'alphabetLength': 2, 'locations': [(None, 0)], 'cigar': None}
>>> edlib.align(':', '-')
{'editDistance': 1, 'alphabetLength': 2, 'locations': [(None, 0)], 'cigar': None}

Hmm I am not sure how can we debug this :). Do you have any other diacritic characters that cause the problem? Can you replicate it with C++ library (hm I should also try that)? Do you have more info on core dumped?

edwardanderson commented 6 years ago

Hm. I took a look at the virtual env:

$ pip freeze
edlib==1.1.2.post2

Remove and reinstall moves me on to v1.2.0, and fixes the issue :)

Martinsos commented 6 years ago

Awesome, I guess I somehow fixed in the meantime :D.