-
Hello Team,
Thanks for creating this amazing library. Is there any way to use the library for PySpark and SQL instead of Pandas?
-
Encontrar material para compartir con Diego para darle contexto al proyecto.
Temas:
- Data Pipelines
- Reproducibilidad en proyectos de Machine learning
- Extraccion de relaciones
- Data journa…
-
Hi everyone,
I was following the [tutorial](http://ekzhu.com/datasketch/lshensemble.html#) on MinHashLSHEnsemble and I have a question.
I'm analyzing batch of data and for a given a query set, f…
-
-
I noticed that the default hash function sha1_hash32 returns data in little-endian order (
-
### 🚀 The feature, motivation and pitch
Great feedback from one of our user:
> For our production monitoring, it'd be great to have more operational metrics for us to see the health, utilization, …
-
Jaccard similarity score would a good option. See the example Python code [here](https://stackoverflow.com/questions/46975929/how-can-i-calculate-the-jaccard-similarity-of-two-lists-containing-strings…
-
Right now I have a database of documents and each day new documents enter the database. Lets say that up to a certain day I have all the MinHash functions for each document in my database (corpus).
…
-
How to connect to aws keyspace cassandra as it asks for SSL certificate and service's user name and password ? How to pass it in MinHashLSH's constructor. The way to connect to aws cassandra using pyt…
-
If there is an implementation distributed MinHashLSh ? If not, shard the base dataset into several machines is possible?
For example, if my dataset has 10 billion data, which can't fit in the memor…