Parameter Tuning - Relationship between k and beta - Question

mmantyla commented 6 years ago

I have spent quite some time tuning k and hyper parameters alpha and beta. I have used DEoptim package that works fine. I have found that there is relationship between increase in k and decrease in beta. The frustrating part is that the best perplexity values are obtained with very high k (k=n_of_input / 4). For example with 4,000 document this results in 1,000 topics which makes no sense for any purpose I could think of. At the same time beta becomes very low, e.g. <0.0001.

The same relationship between increasing k and decreasing beta can be seen in here Slide 14. http://phusewiki.org/wiki/images/c/c9/Weizhong_Presentation_CDER_Nov_9th.pdf In their model, the best perplexity value is found in max k and min beta. However, they do not continue beyond beta 0.01 although their graph suggests that there would be improvement.

Given that this relationship seems to exist in general, I think it would be best to simply fix beta to some value and tune only k and alpha. What would be the smallest beta that would make sense?

dselivanov commented 6 years ago

I personally don't think we ever need to try optimize perplexity. As shown in many articles perplexity is weakly correlated to people judgement. And yes - generally the more topics the better perplexity.

So this is my I'm very happy to see #252 and Manuel fantastic work.

dselivanov commented 6 years ago

I think we can close this - #252 merged. In practice it makes sense to use coherence measures for cross-validation.

dselivanov / text2vec

Parameter Tuning - Relationship between k and beta - Question #256