anhaidgroup / py_stringsimjoin

Scalable String Similarity Joins in Python
BSD 3-Clause "New" or "Revised" License
39 stars 17 forks source link

Add self-join functionality #14

Open adelaneh opened 6 years ago

adelaneh commented 6 years ago

When the two input string sets are the same, the time to tokenize the strings, calculate the similarity and the space required to store and return the results will be (at least) twice larger than necessary. It will be much more efficient to write self-join functions for each join algorithm.