Closed aligagag closed 1 year ago
same happened with me total documents was about 30K, and the outliers was approximately 50% of the whole data.. and when I randomly check some documents from outliers I found many related documents.. any suggestion may help
To reduce the number of outliers, I would refer you to the FAQ. There it mentions three strategies that you can use for reducing outliers in BERTopic (adjusting HDBSCAN params, using .reduce_outliers
in BERTopic, and using a different clustering algorithm).
Very, very helpful, thank you very much
Thanks alot
Hello MaartenGr, I did not set the parameter nr_ topics when using Bertopic to process my data (30000 entries). In the end, 512 topics were obtained, but a lot of data (10000 items) were classified as topics labeled -1. However, upon manual inspection, it can be found that many of the topics labeled -1 belong to other topics.I would like to ask you: What parameters can be adjusted to solve this problem? Or rather, what steps or parameters affect the final number of topics without setting the nr_topics parameter?