ekzhu / datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
https://ekzhu.github.io/datasketch
MIT License
2.52k stars 294 forks source link

Is the return of MinHashLSH.query() in order #180

Open charlotte-ling opened 2 years ago

charlotte-ling commented 2 years ago

Is the return of MinHashLSH.query() in ascend/descend order by Jaccard similarities

bdeng3 commented 2 years ago

Same question here. I'm also wondering whether it is possible to get the estimated similarity if we use MinHashLSH.query(), instead of just knowing which keys are above the threshold.

ekzhu commented 2 years ago

Is the return of MinHashLSH.query() in ascend/descend order by Jaccard similarities

It is currently not. You can sort it by computing the estimated Jaccard with MinHash. MinHashLSH should be the first step of the retrieval process to locate promising candidates, reducing the computation you need to spent in filtering and ranking the candidates.