-
Hi, Dear Development Team,
We have recently used “faiss.index_factory(dim,'Flat', faiss.METRIC_Jaccard)" and index.search() to create index and query, then found the result is not precise. We also fo…
-
I am playing with mismo to deduplicate postal addresses in a set of about 10k entries.
After the expectation-maximization step, the odds of half of the record pairs are equal to `10_000_000_000`, hen…
-
where can i add jaccard and positive predictive value(PPV) in code to measure the accuracy???
-
**Describe the bug**
The `similarity_jaccard` method in python-igraph seems to ignore self-loops, while they are actually counted as neighbors. This leads to a discrepancy in the Jaccard similarity c…
-
First of all, thank you for your excellent work. I have followed your steps to train the model on the synapse dataset and then evaluated it, but the result I obtained was only 0.8638526351861252. I un…
-
Hello!
First of all, this is an awesome package, and thank you so much!
I constructed 6 ordination plots with my data, and I am trying to collect them together. However, although I have tried mu…
-
https://blog.nelhage.com/post/fuzzy-dedup/
-
**Is your feature request related to a problem? Please describe.**
`jaccard_string_group()` takes too long on 25 million rows with about 1000 dirty categories, paring down to about 200 clean categori…
-
> If more than two sets are provided, the mean of all pairwise scores is calculated.
It would be great to be able to get a matrix of pairs, for tasks such as hierarchical clustering and pairwise di…
-
Hey,
I've been looking at the PARC code, and i noticed that in lines 298-306 (PARC/parc/_parc.py), you first create edgelist and then make a copy of edgelist. The copy is then used in the following…