Topics in onlinehdp - Githubissues

blei-lab / online-hdp

Online inference for the Hierarchical Dirichlet Process. Fits hierarchical Dirichlet process topic models to massive data. The algorithm determines the number of topics.

GNU General Public License v2.0

144 stars 48 forks source link

Topics in onlinehdp #6

Open Abigale001 opened 6 years ago

Abigale001 commented 6 years ago

In the paper, the author said, the number of topics can be determined with cross validation or held-out likelihood. But I run the code, and just set default T and K. The number of topics is always equal to T. Anyone knows why?

jorgecastillo2 commented 6 years ago

before that, authors say "In a traditional setting, where fitting multiple models might be viable" and thereafter that, they say: "However, these techniques become impractical when the data set size is large, and they become impossible when the data are streaming. Online HDP provides the speed of online variational Bayes with the modeling flexibility of the HDP."

T=150 and default show is 20 topics, and they are ordered by relevance

zeyd31 commented 5 years ago

@jorgecastillo2 this does not correspond to the motivation "Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions." I find that many discussions are open about this point and no concrete answer is given yet.

dgarridoa commented 4 years ago

The gensim implementation based in it get the same error, the number of topics inferred is always equal to T, i will have to use the C++ implementation in place.