Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Make sequence more abstract, so it can be anything, not just array of chars. #90

Open Martinsos opened 6 years ago

Martinsos commented 6 years ago

Multiple people where asking about support for multybyte characters (unicode). One way to provide that and even more is by making a sequence not an array of chars, but instead an array of objects that satisfy the condition that they have equality operator defined over them.

What would the impact on speed be in this case? I think it would not be big impact, since they are anyway used only to calculate Peq and after that Peq is used.

Would it make it harder to use edlib for usual cases? Would it become to general, hard to use for strings? How could we make sure it is still easy to use while offering flexilibity?

Finally, this might be easier to implement if I decide before that to go with just C++ interface, so I should think about that first.

Martinsos commented 6 years ago

So far there have been 3 issues asking for multibyte support, so I assigned important label to this feature as it seems to be important to users.

Martinsos commented 5 years ago

With @jbaiter 's addition to Python version of Edlib this issue is less pressing, but still, it should be the next one to do.

Martinsos commented 4 years ago

This is also linked to this: https://github.com/Martinsos/edlib/issues/141 (Unicode support in python edlib).

Martinsos commented 3 years ago

@masri2019 has been working on this for some time now with a little bit of my guidance, so I will document here what has been done and what is yet to be done to call this feature complete!

We are using "big" feature branch gen-seqs where we are collecting these changes, and will merge them back into master once it is done.

Additional ideas/considerations:

Martinsos commented 2 years ago

Hey @masri2019, how are you doing? We made great progress with this one and then stopped -> are you still interested in possibly continuing with it, how are you with time?

mobinasri commented 2 years ago

Hi Martin!

Thanks for asking. Yes I'm definitely interested in finishing what we have started. I have been busy doing some other projects but I can plan to dedicate some time to edlib. Based on what you sent, the next step is updating the readme. I'll create a pull request for that.

-Mobin

On Tue, Aug 31, 2021 at 11:31 AM Martin Šošić @.***> wrote:

Hey @masri2019 https://github.com/masri2019, how are you doing? We made great progress with this one and then stopped -> are you still interested in possibly continuing with it, how are you with time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/issues/90#issuecomment-909495121, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANLIBF55QRLUOAKGYESXSXLT7UNZ5ANCNFSM4DXXI44A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Martinsos commented 2 years ago

@masri2019 that is awesome :)!! I will also do my best to help you, I believe the two us can finish it together, if needed I can involve myself more, I should also be able to carve out some time.

Yes, the next step is README based on the checklist I created above (which I am now really happy I made because I would have no idea where we stopped otherwise :D). And then python bindings. I am sure we can get both of those done.

Next will be discussion about C wrapper, that might be a bit harder, but ok that is also doable. And then final polishing!

All together sounds like we (you) did the hardest part already, so really looking forward to this. Although, you know how they say: last 20% takes 80% of the time. But let's hope in this case percentages will be gentle to us.

Martinsos commented 2 years ago

@masri2019 I am guessing it might be a bit hard getting back into it after so much time, so I would advise you do what you can and if you get stuck somewhere no worries, make a draft PR and I can jump in, we will figure it out together. I also forget a lot of things but I am sure we will remember it relatively quickly, since we were writing pretty nice code.