KarinaBunyik / Twitter_hidden_topics

Finding those twitter topics that do not appear in another news media
3 stars 1 forks source link

Number of topics #24

Closed KarinaBunyik closed 10 years ago

KarinaBunyik commented 11 years ago

Figure out how to test different number of topics in mallet LDA.

KarinaBunyik commented 10 years ago

From MALLET documentation: The number of topics to use. The best number depends on what you are looking for in the model. The default (10) will provide a broad overview of the contents of the corpus. The number of topics should depend to some degree on the size of the collection, but 200 to 400 will produce reasonably fine-grained results.

KarinaBunyik commented 10 years ago

Blei has a code for determining the number of topics. online hdp

KarinaBunyik commented 10 years ago

A relatively simple way to find the optimum number of topics without training data is by looping through models with different numbers of topics to find the number of topics with the maximum log likelihood, given the data.