-
**While creating umap embeddings for HDBSCAN clustering, I am getting this user warning**, `UserWarning: WARNING: spectral initialisation failed! The eigenvector solver
failed. This is likely due to …
-
Hi Maarten,
could you tell me why BERTopic should be preferred over other topic modeling techniques like LDA and NMF. I know that these techniques are more time intensive because of hyperparameters …
-
**Is your feature request related to a problem? Please describe.**
When implementing the HDBScan clustering algorithm, there's no way to know the clusters' centroids that the model has generated thro…
-
Hello,
Saving a BERTopic model trained with 3.5M texts, the resulting file is some ±25GB big.
I have a memory (RAM) concerns when inferencing with the model (*).
I am trying to lower the model si…
-
I'm trying out an algorithm for clustering texts called [top2vec](https://github.com/michalovadek/top2vecr) implemented by @michalovadek
This algorihm first applies doc2vec on texts to get document e…
-
I understand that finding good values for `n_neighbors ` and `min_topic_size` is difficult, as it depends on the number of documents, the length of those documents, and how many topics one would like …
-
The current set of public examples includes the _5d kinematic clustering_ notebook, which uses `hdbscan`.
The current deployment configuration in not large enough to support this, and the notebook fa…
-
For a trained HDBSCAN object, I would like to predict the cluster for new data points similar to what is described [here](https://hdbscan.readthedocs.io/en/latest/prediction_tutorial.html). I see that…
-
Are there any plans to get HDBSCAN implemented?
https://github.com/scikit-learn-contrib/hdbscan
-
Hi Thomas. Thank you for open sourcing this library, it's very useful.
I found your idea of using Isolation Forest for downsampling the observations passed to SHAP very interesting. I'm wondering i…