BabakHemmatian / Gay_Marriage_Corpus_Study

LDA and RNN for Reddit comments
0 stars 0 forks source link

Calculate topic coherence metrics for topic models with varying num_topics #22

Closed sabjoslo closed 6 years ago

sabjoslo commented 6 years ago

To address the difference in "quality" between LDA models with different numbers of topics, calculate the topic coherence of models with 25, 50, 75 and 100 topics (example here).

sabjoslo commented 6 years ago

Did you calculate these? What were the results?

BabakHemmatian commented 6 years ago

Unfortunately not very helpful. The fewer the topics, the better the umass coherence. Even down to 2 topics. The extrinsic measures wouldn't be easy to calculate given the dataset we have and our goals either. We would need an external corpus of same-sex-marriage-related discussions. Thing is, the coherence value for 50 topics was quite acceptable. So I think we should go with that.