Open Abigale001 opened 6 years ago
before that, authors say "In a traditional setting, where fitting multiple models might be viable" and thereafter that, they say: "However, these techniques become impractical when the data set size is large, and they become impossible when the data are streaming. Online HDP provides the speed of online variational Bayes with the modeling flexibility of the HDP."
T=150 and default show is 20 topics, and they are ordered by relevance
@jorgecastillo2 this does not correspond to the motivation "Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions." I find that many discussions are open about this point and no concrete answer is given yet.
The gensim implementation based in it get the same error, the number of topics inferred is always equal to T, i will have to use the C++ implementation in place.
In the paper, the author said, the number of topics can be determined with cross validation or held-out likelihood. But I run the code, and just set default T and K. The number of topics is always equal to T. Anyone knows why?