Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Enabling additional equalities for generic sequences (using sets) #154

Closed mobinasri closed 3 years ago

mobinasri commented 4 years ago

This PR contains the implementation of an unordered_set of pairs and using it to activate additional equalities for generic sequences.

Here is the list of main changes:

  1. Added pairhash.hpp which contains a hash function for std::pair
  2. If we are given an array of additional equality pairs, an unordered_set of these pairs is created. For example if we are given these pairs; (a,b) , (b,c), (a,d) -> we make this set { (a,b), (b,a), (b,c), (c,b), (a,d), (d,a) } which contains both permutations for each pair.
  3. This unordered_set is used for checking the equality of any two elements. To speed up the process, searching the set is avoided as much as possible. We check the below conditions in advance, if any of them is true we ignore searching the set;
      1. If two given elements are the same
      1. If at least one of the given elements is not a member of any additional equality pair (using a prepossessed boolean array)

It can speed up the process because we expect that a large portion of comparisons fall into at least one of the above scenarios.

Martinsos commented 3 years ago

I somehow missed the changes you made following my comments!

So, all is good I believe and we can merge, but I see that continuous integration on AppVeyor is failing, with error C:\projects\edlib\edlib\include\edlib.hpp(140): error C4519: default template arguments are only allowed on a class template [C:\projects\edlib\runTests.vcxproj]. I found this SO Q/A about it: https://stackoverflow.com/questions/2447458/default-template-arguments-for-function-templates . We should fix this before proceeding. It seems that having default templates on functions is problem, and while modern compilers support it, some older ones still don't, especially on windows. I wonder what to do about it? Should we not have defaults? Or should we say: you need to use this new compiler and so on? Because allegedly, it is just a legacy thing, which Bjourne said is just preventing good programming practices today.

Martinsos commented 3 years ago

I somehow missed the changes you made following my comments!

So, all is good I believe and we can merge, but I see that continuous integration on AppVeyor is failing, with error C:\projects\edlib\edlib\include\edlib.hpp(140): error C4519: default template arguments are only allowed on a class template [C:\projects\edlib\runTests.vcxproj]. I found this SO Q/A about it: https://stackoverflow.com/questions/2447458/default-template-arguments-for-function-templates . We should fix this before proceeding. It seems that having default templates on functions is problem, and while modern compilers support it, some older ones still don't, especially on windows. I wonder what to do about it? Should we not have defaults? Or should we say: you need to use this new compiler and so on? Because allegedly, it is just a legacy thing, which Bjourne said is just preventing good programming practices today.

Or maybe we just shouldn't care about VS 10 and drop it from AppVeyor check hm! I like that better, if people complain we can see what to do, but I don't like not using modern features just to support very old systems.

Martinsos commented 3 years ago

@masri2019 ok, I figured it out! I updated AppVeyor (our CI service for Windows) to use newer versions of Visual Studio, so hopefully that will help. I did that update on master, so you could cherry pick it here if you want or we will rebase later.

This PR is ready to merge, but could you first squash the commits? Let's squash them into optimaly one (or multiple if there are distinct sets of changes) commit, and then I can merge it.

Martinsos commented 3 years ago

Also, this change just happened on master: https://github.com/Martinsos/edlib/pull/158 . Since we did big changes, I wouldn't want us to miss that and override once we merge, so maybe we should do it now here also? Or we write it down and remember later when merging to keep an eye out for it.

mobinasri commented 3 years ago

Hi @Martinsos, My apologies for being so late. Because of fire evacuation order in California and its conflict with some of my university deadlines I completely missed to finish this PR. I just added a commit based on #158 and re-based this branch.

Martinsos commented 3 years ago

@masri2019 no problem, I didn't forget about it but assumed you are on vacation and did not want to bother you! Sorry to hear about you being affected by the situation in California, I hope all is well! Great work, I merged this. I also dedicated special issue to make it easier for us to track progress, check out this comment: https://github.com/Martinsos/edlib/issues/90#issuecomment-699495428 , so we can now use it to see what needs to be done next and how is the progress going.

Next step would be updating main README.md according to the changes you have made -> would you like to take that on? Let me know how you would like to continue, I would love it of course if you take on as much as possible :D, but I understand if you have time limits and similar.

mobinasri commented 3 years ago

@Martinsos Thanks for your understanding. Great idea for tracking the progress. I'd love to take the next steps with your guidance. I start updating the README.md this week and send you the PR as soon as it is done.

Martinsos commented 3 years ago

@Martinsos Thanks for your understanding. Great idea for tracking the progress. I'd love to take the next steps with your guidance. I start updating the README.md this week and send you the PR as soon as it is done.

Awesome let's continue then :)!