Closed KarinaBunyik closed 10 years ago
From MALLET documentation: The number of topics to use. The best number depends on what you are looking for in the model. The default (10) will provide a broad overview of the contents of the corpus. The number of topics should depend to some degree on the size of the collection, but 200 to 400 will produce reasonably fine-grained results.
Blei has a code for determining the number of topics. online hdp
A relatively simple way to find the optimum number of topics without training data is by looping through models with different numbers of topics to find the number of topics with the maximum log likelihood, given the data.
Figure out how to test different number of topics in mallet LDA.