MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.12k stars 763 forks source link

How can i use BERTopic to classify by using information in (-1) topic? #829

Closed minghehe-nobug closed 1 year ago

minghehe-nobug commented 1 year ago

Sorry to border you that when i study BERTopic, some of topics like "person_people" are not “stand-out” topics(compare with "player_team_football")for the model and distributed to topic -1 but they surely representative a class so is there a way i can get all topics without -1 topic? thanks a lot!

MaartenGr commented 1 year ago

If you want to remove the -1 topic from the model, it might be worthwhile to use a different clustering algorithm instead. HDBSCAN generates outliers which in some cases can be useful but this might not always be preferred. Instead, use a clustering algorithm like k-Means which allows you to select the number of topics and it does not generate the -1 topic.

It might be helpful to dive into the documents that are found within your topic to see how the representations are generated. A personal favorite of mine is to visualize both documents and topics.

The most important thing to note here is that getting an understanding first of the topics and documents is necessary in order to understand not only what the model is doing but also how it is generating its clusters and the things you can do to improve that. You can find a bunch of tips here and some frequently asked questions here.

minghehe-nobug commented 1 year ago

thanks very much! I'll try K-means and go further and deeper learing this project!