Open daianaccrisan opened 7 months ago
You are not passing the UMAP model to BERTopic. Please make sure to follow the best practices guide here: https://maartengr.github.io/BERTopic/getting_started/best_practices/best_practices.html or the FAQ here: https://maartengr.github.io/BERTopic/faq.html#why-are-the-results-not-consistent-between-runs
Hello Maarten!
These are the configs I am using for my model to run on a dataset of news articles. When running the model with the default min_cluster_size, I get 200+ topics. When I run it the second time, I get five topics (for 7,500 documents). I tried it with different numbers for the min_cluster_size and for whatever number I give ( 30, 100) I get 3 topics.
hdbscan_model = HDBSCAN(min_cluster_size=20,prediction_data=True). embedding_model = "sentence-transformers/all-MiniLM-L6-v2" vectorizer_model = CountVectorizer(stop_words="english")
`topic_model = BERTopic(
Could you please tell me if I am doing something wrong? I am running the code in Google Colab and have used BERTopic before i this env but my results have never differed so much from one run to another.
Best regards, Daiana