-
I was generating the heatmap using `self.model.visualize_heatmap()` method and have noticed that the visualization doesn't match the distance values.
I think, the issue is that you are including `-1…
-
Sorry to border you that when i study BERTopic,
some of topics like "person_people" are not “stand-out” topics(compare with "player_team_football")for the model and distributed to topic -1
but they …
-
Hi Marteen, thank you as usual for your unvaluable help.
I tried using online topic modeling on my 2million tweets dataset. Unfortunately, I believe using MiniBatchKMeans creates some problems - a…
-
Would https://github.com/scikit-learn-contrib/hdbscan be a good candidate for replacing the current clustering algorithm?
I'm just looking at https://hdbscan.readthedocs.io/en/latest/comparing_clus…
-
**Describe the bug**
I'm running Recognize against ~35k images. It's creating way too many clusters, currently above 7k and growing.
```
MariaDB [nextcloud]> select count(*) from oc_recognize_fac…
-
> > import numpy as np
> > probability_threshold = 0.01
> > new_topics = [np.argmax(prob) if max(prob) >= probability_threshold else -1 for prob in probs]
>
> This code indeed does not change the…
-
What is the correct way to predict a label for new documents if I have a `fit` `topic_model`? If I use `transform` on new documents, it always returns a label of `-1`. Is it more correct to use `find_…
-
Hi,
thanks for sharing these projects, super neat work!
I just wanted to ask which are the main differences between KeyBERT and [BERTopic](https://github.com/MaartenGr/BERTopic).
The two approach…
-
Hi there,
I was really happy to find a Java-HDBSCAN-Implementation, but at the same time, I'm a little sad, why there are Enums used for the initialization? DistanceType and NeighboursQueryFactoryT…
-
Hi,
I'm wondering what the differences between "probabilities" calculated from the BERTopic model and the LDA model are. (or do they mean the same thing?)
I'm a beginner in this field and what …