Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Python: Add support for arbitrary sequences of hashable objects #128

Closed jbaiter closed 5 years ago

jbaiter commented 5 years ago

This implements @Martinsos' suggestion from https://github.com/Martinsos/edlib/issues/79 to add support for sequences of arbitrary hashable objects in the Python bindings. If either query or target contain non-ascii values, they are mapped into an ASCII alphabet and the resulting byte sequences are used for doing the alignment.

One limitation that we can't get around at the moment is that the query and target sequence together must not contain more than 256 unique values.

It certainly is not the ideal way to go about this, but it should serve as an acceptable workaround for a lot of use cases until https://github.com/Martinsos/edlib/issues/90 is implemented.

This should help with #123, #114, #109, #104, #89 and #79.

Martinsos commented 5 years ago

@jbaiter thanks this is very cool :)! I am certainly in for integrating this, I will take a look at it over the weekend and write comments on the code to make sure we get it right, stay tuned for that. Thanks again :)!

jbaiter commented 5 years ago

Thanks for the thorough review!