Closed grassdew closed 7 years ago
If you visit David Blei's page you'll see papers discussing Hierarchical Dirichlet Processes.
Basically there exist nonparametric topic models which can automatically find the "optimal" number of topics for a given corpus. But implementing such a model in this package I felt would take me too far afield and would have led to the type of feature creep which I wasn't willing to fully commit to, so I decided not to implement any of them.
Other than that there also exist certain metrics, such as perplexity (see David Blei's original LDA paper) which can evaluate the quality of the topics, which you could use to compare the model across multiple topic numbers, however since the output is human interpretable, it's often just easier to try different numbers of topics and see for yourself what works best.
Thanks for your comment. I will try different numbers of topics to determine how many I need.
Hi,
I wonder if there is a way to help determine the optimal number of topics? Thanks.