Open cuent opened 4 years ago
I think the best topic can be selected according to task requirements. For example, if you want the easiest to explain, you can choose the topic consistency index; or to better fit the data, you can choose the confusion index
Could you please elaborate on the consistency/confusion index? I thought it was a way of selecting the most used topics one by doc_frequency and topic proportion
maybe this question is dumb but I don't understand why the average of the weighted document-topic-proportions is a metric for the most important topics?
From my understanding, the product of each document frequency (
sums
) with document-topic probabilitiestheta
amplifies or reduces probability-based on the actual probability. And the average provides some insights on which topics are important in the whole corpus. Is it right? Also, what would be the difference if we only average the document-topic proportions (no weighting)