MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.09k stars 757 forks source link

Output of the visualize_distribution() for noise/outlier document #1424

Open terilias opened 1 year ago

terilias commented 1 year ago

Hello, First of all thank you for publishing this powerful library! My question is about the output barplot of the visualize_distribution() function. I am call it for a document that has been take the topic "-1" but the topic "-1" doesn't shown in the barplot. A different topic seems to have the highest probability instead of the "-1" topic. My thought is that maybe the chart shows the probabilities for the rest of the topics (those that are not "-1") but I didn't read that in the function documentation so I thought I'd write it here.... Screenshot_20230723_150116

MaartenGr commented 1 year ago

That's correct! The probabilities of the -1 topic are not directly outputted and should be calculated as 1 - sum(probabilities). As a result, those of "actual" topics/clusters are shown.

terilias commented 1 year ago

Thank you for the answer!