-
-
I'm using this script as a base for a project https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/clustering/fast_clustering.py
When I set the threshold value high, …
-
The borvuka_balltree algorithm produces incorrect single linkage trees for some data sets with duplicate entries. The incorrect single linkage trees result in incorrect clusterings for these data set…
-
I would like to first thank you all for the very useful library you have created, thank you very much!
To the question: I am using HDBScan to cluster data that I am generating from a simulation tha…
-
**Hi Maarten!** Thank You for this awesome library that makes topic modelling to much easy. I am really impress to this library and show best library compare to other topic modelling technique. My fo…
-
The currently supported clustering methods in v 0.1 are a bit limited.
Suggest to add density based methods as implemented in dbscan.
-
I tested multiple combination of `min_dist` and `n_neighbor` on my data, I found that a suitable combination of those hyperparameters can separate the all the cluster(according to labels). I wonder ho…
-
I would like to use some precomputed distance matrices in HDBSCAN, and later predict the cluster memberships of the new datapoints. I understand that this is currently not possible, as HDBSCAN cannot …
-
Using bertopic==0.16.0 on a MacOS M1 machine, I have found some very strange behavior for the probabilities for each topic.
```
dataset = load_dataset("CShorten/ML-ArXiv-Papers")["train"]
docs = …
-
Hi,
I have a problem statement which need similar sentences to be grouped together. Can I use clustering algorithms like DBSCAN, HDBSAN to cluster the embeddings together?