MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
723 stars 68 forks source link

Faiss #2

Open MaartenGr opened 3 years ago

MaartenGr commented 3 years ago

Faiss allows you to efficiently search and cluster dense vectors. This could be beneficial when comparing the cosine similarities between vectors in the TF-IDF and Embeddings model.

However, since it is a conda-only install and not pip-based and there is currently no Windows version (aside from the nightly version) it is best to postpone this until the windows version is stable. This would also require the user to install Faiss before installing PolyFuzz which does not help the user experience.

DGaffney commented 10 months ago

Thread bump? I'm currently running into some scaling issues with Polyfuzz TFIDF and Faiss is solving the problem for me. When I get some spare cycles I'd be happy to push up a PR to incorporate if that's interesting to you? Seems faiss installs way easier these days....

MaartenGr commented 10 months ago

If I am not mistaken, it still requires conda to install it right? That would not allow it to install it through the pip method that is currently used in this repo.

Having said that, it could be used as an additional method of quickly finding the right vectors. So, I am all for it as long as we can keep it a relatively minimal approach 😄