Closed minghehe-nobug closed 1 year ago
If you want to remove the -1 topic from the model, it might be worthwhile to use a different clustering algorithm instead. HDBSCAN generates outliers which in some cases can be useful but this might not always be preferred. Instead, use a clustering algorithm like k-Means which allows you to select the number of topics and it does not generate the -1 topic.
It might be helpful to dive into the documents that are found within your topic to see how the representations are generated. A personal favorite of mine is to visualize both documents and topics.
The most important thing to note here is that getting an understanding first of the topics and documents is necessary in order to understand not only what the model is doing but also how it is generating its clusters and the things you can do to improve that. You can find a bunch of tips here and some frequently asked questions here.
thanks very much! I'll try K-means and go further and deeper learing this project!
Sorry to border you that when i study BERTopic, some of topics like "person_people" are not “stand-out” topics(compare with "player_team_football")for the model and distributed to topic -1 but they surely representative a class so is there a way i can get all topics without -1 topic? thanks a lot!