ericproffitt / TopicModelsVB.jl

A Julia package for variational Bayesian topic modeling.
Other
81 stars 8 forks source link

Optimal number of topics #8

Closed grassdew closed 7 years ago

grassdew commented 7 years ago

Hi,

I wonder if there is a way to help determine the optimal number of topics? Thanks.

ericproffitt commented 7 years ago

If you visit David Blei's page you'll see papers discussing Hierarchical Dirichlet Processes.

Basically there exist nonparametric topic models which can automatically find the "optimal" number of topics for a given corpus. But implementing such a model in this package I felt would take me too far afield and would have led to the type of feature creep which I wasn't willing to fully commit to, so I decided not to implement any of them.

Other than that there also exist certain metrics, such as perplexity (see David Blei's original LDA paper) which can evaluate the quality of the topics, which you could use to compare the model across multiple topic numbers, however since the output is human interpretable, it's often just easier to try different numbers of topics and see for yourself what works best.

grassdew commented 7 years ago

Thanks for your comment. I will try different numbers of topics to determine how many I need.