Closed Navein closed 6 years ago
Hi Navein. By similarity matrix you mean pair-wise Jaccard similarity score for every pair of sets? If your goal is to have the exact similarity scores, then this package cannot help you.
If you are okay with approximate Jaccard similarity scores, then you can create a MinHash for each set, and compute all pairs using the MinHashes. This should be faster than computing the exact scores, if the sets are mostly larger than the number of hash values used in MinHash.
Hi, how can I generate a similarity matrix by using minhash LSH? Minhash seems to compute only the jaccard comparison while minhash LSH outputs a list of candidates according to the similarity threshold set. I would like to use the similarity matrix for further clustering, and would like to know if this is possible with this package.