-
Hi Marteen and hi to the whole community.
I am writing because I am trying to use BERTopic to analyse about 12,000 maintenance-related documents. First of all I pre-processed the data using algorith…
-
I encountered an IndexError with the visualization of heatmap. Here's the sequence of steps I took and the resulting error.
I instantiated a BERTopic model and ran the fit_transform() method as follo…
-
I think we should add HDBSCAN. the original paper is from 2013, @lmcinnes's accelerated version is from 2017, the original paper has 300 citations, the 2017 JOSS paper about the implementation has 100…
-
Hi,
I wanted to perform the hierarchical topic modeling on my model. The model is a fitted & transformed BERTopic model that uses the default UMAP and HDBSCAN and also uses the default sentence-trans…
-
Hi,
I set the nr_topics to 'auto' in:
```
self.topic_model = BERTopic(
embedding_model=self.embedding_model, # Step 1 - Extract embeddings
umap_model=…
-
1. If the specified number of topics is passed in nr_topics, how to determine that the selected number is reasonable? Is there an index to evaluate the quality of the model with different number of to…
-
We are experimenting with using PCA and UMAP to dimension-reduce the entity embeddings before clustering.
Running full PCA on the whole set of entities could be computationally intensive. Two solu…
-
Dear all,
I am facing large issues working with BERT. I have got a dataset of around 1 million tweets. Firstly, I want to train my model with 50 percent of my dataset; then in the second step I want …
-
Hi,
I am going to use online topic modeling in my project on a data with 360 documents. I use the below code as I copied it from the BERTopic webpage:
` # Incrementally fit the topic model by t…
-
Hi, I am trying to use BERTtopic on a small number of documents (say, ~20-30). These articles discuss different aspects of vaccinations/animal testing. Most of the time, the model returns zero topics …