-
I have a large number of embeddings (768 dimensions) which I am attempting to cluster. I was playing around with datasketch and WeightedMinHash to see if it was possible to use the resulting Jaccard d…
-
-
-
Cargar el PDF adjunto a este issue para descarga del [kit de prensa](https://cifras.biodiversidad.co/mas/prensa) y ajustar los datos de contacto a:
Sistema de Información sobre Biodiversidad de C…
-
Hi,
I'm trying to use the all_pairs() function to find all the (near-)duplicates in a set of about 14000 text documents (after turning them into ngram shingles first). However, I'm running against …
tsela updated
4 years ago
-
When I run the code provided in the example below, the dropdown menu to select a language is missing.
```
if (interactive()) {
library("shiny")
library("shi18ny")
ui
-
Hi, I have a question about large-scale LSH index. If I have billions of documents, I suppose even 1T RAM is not enough to do in-memory LSH, is there any recommended way to use datasketch for this sce…
-
Maybe a bit of a dumb question but I'm a little confused by the `_insert` method in the `MinhashLSH` class:
```python
def _insert(
self,
key: Hashable,
minhash: Union[Mi…
-
I prepare 10 synthetic examples.
```python
import random
values = []
queries = []
count = 1
for _ in range(10):
value = []
for _ in range(100):
value.append(count)
…
-
Hi, this package looks really cool and I'd love to use it for my use case.
I have about 7,000 sets with about 1,000 elements each that I'm using as my index. I also have a set of about 1,000 quer…