Hi everyone, first of all I would like to thank @MaartenGr and all the contributors for this amazing project.
For my project, I need to calculate the entropy of each topic. Could you help me how to calculate entropy in Bertopic. I have used probs to calculate, but the bug showed that the probs were 1 dimension array. But my code requires two dimension array. Thank you very much!
Have you searched existing issues? 🔎
Desribe the bug
Hi everyone, first of all I would like to thank @MaartenGr and all the contributors for this amazing project. For my project, I need to calculate the entropy of each topic. Could you help me how to calculate entropy in Bertopic. I have used probs to calculate, but the bug showed that the probs were 1 dimension array. But my code requires two dimension array. Thank you very much!
Reproduction
import numpy as np import pandas as pd
doc_topic_matrix = np.array(probs)
normalized_doc_topic_matrix = doc_topic_matrix / doc_topic_matrix.sum(axis=1, keepdims=True)
topic_entropy = (-normalized_doc_topic_matrix * np.log2(normalized_doc_topic_matrix + 1e-9)).sum(axis=0)
entropy_df = pd.DataFrame({'Topic': range(len(topic_entropy)), 'Entropy': topic_entropy})
topic_freq['Entropy'] = sorted_entropy_df['Entropy'].values
BERTopic Version
0.16.4