Probability Distribution

Hi, I used the code to get the document information. For each document, I got a value of probability. From my understanding this value is the probability of the document belong to a particular topic and the topic has the highest probability value among the others. There are 21 topics from my BERTopic results. For example, document one belong to topic -1 with reported probability 0.441466441. Then I run the following code to get the probability distribution:

df=tm.approximate_distribution(doc) df_prob= pd.DataFrame(df[0])

The first document results are as following:

0.030 | 0.069 | 0.084 | 0.052 | 0.052 | 0.047 | 0.028 | 0.019 | 0.084 | 0.052 | 0.022 | 0.019 | 0.086 | 0.068 | 0.030 | 0.112 | 0.059 | 0.000 | 0.043 | 0.030 | 0.014

There are total 21 values. My question is: Are these 21 values the probabilities of document one belong to each of the 21 classified topics? If it is from tm.get_document_info(doc), I got the first document classified to topic -1 with probability around 0.44. Why the result from tm.approximate_distribution(doc), the probability is 0.030? Can you please help me understand this? I use BERTopic in my dissertation and I need to talk about probability distribution of the document. Thank you very much.

MaartenGr / BERTopic

Probability Distribution #1779