Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

I made bindings for Julia #146

Closed cjdoris closed 4 years ago

cjdoris commented 4 years ago

Thought you'd like to know I made edlib bindings for Julia, here: https://github.com/cjdoris/Edlib.jl

Maybe it could be mentioned in the README, alongside the Python bindings?

Martinsos commented 4 years ago

Sorry for not responding sooner, this is really cool :)! Sure, I will add link into the README :)!

How was your experience using edlib, how is it that you made binding for Julia? What have you been using Edlib for, if you don't mind me asking? Thanks!

Martinsos commented 4 years ago

Added to README! Thanks :)!

cjdoris commented 4 years ago

Thanks for adding the link!

I just needed a fast edit distance, and I couldn't see one for Julia and your library seems to be rated the fastest, at least for longish strings.

It's really simple to make bindings for C libraries in Julia. Firstly, there is the Yggdrasil.jl project, which provides pre-build portable binaries for a lot of libraries, so I added Edlib to that (https://github.com/JuliaPackaging/Yggdrasil/tree/master/E/Edlib), which basically amounts to telling it how to build the library, then it goes and builds it for every platform. Then my wrapper library Edlib.jl loads the library built by Yggdrasil.jl and uses Julia's C interface to call its functions directly, and provides a more Julia-friendly API.

cjdoris commented 4 years ago

(And thanks of course for writing a very useful library!)

Martinsos commented 4 years ago

Awesome :)! Yes, Edlib should be among the fastest for the longer strings. Additional question: Would it be much harder to create bindings for C++ library in Julia? We are working on this new feature for Edlib but considering dropping the C API and just going with C++ API so I am wondering how problematic would that be.

cjdoris commented 4 years ago

Julia does also have a C++ interface, though I haven't used it much. If it's possible to keep a C API going, it's always appreciated for simplicity. What are you doing that requires C++?

Martinsos commented 4 years ago

We (@masri2019 is really doing all the work, I am just guiding :)) are adding ability to provide generic sequences to edlib, meaning that sequence does not have to be const char* anymore in C/C++, instead it can be array of anything that satisfies a couple of properties like comparison and maybe smth else. This is the most requested feature from the Edlib users, and the big thing is that it enables input sequences with any size of alphabet (alphabet size was so far restricted to 256 due to the size of char), enabling those with larger alphabets to also use Edlib. However, we are using C++ generics for this, which means C API can't be mapped so easily to this. Idea is to forget the C API for now, make it so it works just for C++, and then we will figure out how to implement C API over it. We were also thinking not implementing it at all if people don't use it, so that is why I am asking about it to check if anybody will even notice if C API is gone.

You can see progress on gen-seqs branch, the most visible change to user is new definition of edlibAlign method: CODE .

I also often felt that C API is stopping us from making some parts of the API nicer, so that is one extra reason.

cjdoris commented 4 years ago

That sounds cool, and makes a lot of sense. Hopefully I'll be able to call the C++ version from Julia --- I'll try that out some time --- and if not, I can help out adding a C API back in.

Martinsos commented 4 years ago

We might also need C API for the Python binding that we are maintaining, so let's see :), but yes, it shouldn't bee too hard to add it -> it is more about figuring out how to shape it.