Open pjlambert opened 8 months ago
Thanks for the wonderful suggestion, Peter. We too have tried range search on other projects and found it to be great.
Re: only CPU support, it's not a problem with the current version of the package - it only uses cpu faiss (primarily because of dependency issues. Feel free to create a pull request for this. We are going to update it soon (along with the paper and the models - we found a way to increase the amount of data available to us), so if you haven't made a request by then, I can implement it around mid-March.
We are thinking of creating a GPU only branch (but not offered a pip package- primarily because dependency management is a bit messed up with faiss GPU and other packages required - pip install X doesn't work well) for more scaled up applications.
I am glad that the package is working well for you. Hopefully we'll get close to a version 1.x.x soon.
Abhishek
Hi All, again - wonderful package and just terrific work.
One possible extension you might one day consider would be using FAISS's
range_search
function, instead ofsearch
(see https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search). This would allow for a "many-to-many" match in the more traditional sense, perhaps aligning the behaviour of the LT package to prior fuzzy matching packages.The main drawback is that it is not GPU-friendly, but works pretty efficiently on CPUs in my experience.
FWIW, my use-case is to match the universe of job-postings to DnB establishments. I use the range_search along with your firm-name embeddings to to build a dataset with all pairwise matches above a pretty low similarity threshold (0.5). This then gives me a huge set of potential matches, and I use an expectation-maximisation algorithm after this which considers both similarity-scores as well as other structured covariates (but not necessarily exact matching criteria) like industry codes, location-distance, etc to resolve the best match from this candidate set.
One day I would be happy to help implementing this, if you feel it's something you would want to pursue.
Thanks again for all the great work, it's hugely appreciated by many!