Closed mmantyla closed 6 years ago
I personally don't think we ever need to try optimize perplexity. As shown in many articles perplexity is weakly correlated to people judgement. And yes - generally the more topics the better perplexity.
So this is my I'm very happy to see #252 and Manuel fantastic work.
I think we can close this - #252 merged. In practice it makes sense to use coherence measures for cross-validation.
I have spent quite some time tuning k and hyper parameters alpha and beta. I have used DEoptim package that works fine. I have found that there is relationship between increase in k and decrease in beta. The frustrating part is that the best perplexity values are obtained with very high k (k=n_of_input / 4). For example with 4,000 document this results in 1,000 topics which makes no sense for any purpose I could think of. At the same time beta becomes very low, e.g. <0.0001.
The same relationship between increasing k and decreasing beta can be seen in here Slide 14. http://phusewiki.org/wiki/images/c/c9/Weizhong_Presentation_CDER_Nov_9th.pdf In their model, the best perplexity value is found in max k and min beta. However, they do not continue beyond beta 0.01 although their graph suggests that there would be improvement.
Given that this relationship seems to exist in general, I think it would be best to simply fix beta to some value and tune only k and alpha. What would be the smallest beta that would make sense?