MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
5.79k stars 721 forks source link

merge_models create new outlier topic #1876

Open dannylesmy opened 3 months ago

dannylesmy commented 3 months ago

Hi Maarten,

I've been reducing outliers by running another topic model exclusively on them, and it has been effective. However, when attempting to merge the models, a new topic (-1) is created that doesn't exist in all models. For instance, in my initial run, I obtained 83 documents belonging to topic -1. So, I ran it again only on the topic -1 documents, resulting in the addition of two topics that covered all outlier documents, effectively eliminating outliers. However, upon merging the models using 'merge_models', I noticed a new model with topic -1 doubled in size compared to the first run, with 166 documents now belonging to topic -1. Do you have any insights into what might have caused this discrepancy? first model: image second model on the outliers alone: image and the merge_model: image thank you!

MaartenGr commented 3 months ago

Hmmm, not sure what is happening here. There are a couple of fixes for merging topics in the main branch, so could you try installing BERTopic there first? Hopefully, that already resolves the issue. I'm aiming for a minor release somewhere this or next month, but there are some fixes that I would to double-check first and this could be one of them.

dannylesmy commented 3 months ago

yes i'll look at it. thank you again for your response