Closed ValeriiBaidin closed 4 years ago
beta
is the topic distribution matrix, it's of dimension K x V. Each row is the corresponding topic's discrete probability distribution over the vocabulary.
Generally speaking, beta
is what is learned when you train your model on your corpus.
beta
is the topic distribution matrix, it's of dimension K x V. Each row is the corresponding topic's discrete probability distribution over the vocabulary.Generally speaking,
beta
is what is learned when you train your model on your corpus.
Yes, I know it.
But other LDA packages ask about alpha and beta, as initial parameters.
I think, there is the Dirichlet distributions documents and topics with apartments alpha. And the other dirichlet distributions for words and topics with parameters beta. isn't it?
Each row of beta is initialized randomly from a Dirichlet distribution.
I'm not sure there is a good reason for initializing beta to a particular value prior to training, since the main goal of topic models such as LDA or CTM is to learn the topics. You don't know the topics in advance, otherwise there would be no point in using the algorithm in the first place.
The only reason I know of to preset beta to a particular value would be if you had already trained a model and wanted transfer its topic distribution into your new model (warm start).
The situation with alpha is different. It makes sense to uniformly/symmetrically scale the alpha vector prior to training in order to try and control the amount of regularization. Otherwise however, the same reasoning applies, e.g.
alpha = [0.1,0.1] # low regularization
alpha = [100,100] # high regularization
alpha = [0.1,100] # doesn't make sense, since you don't know the topics in advance.
Because beta is a stochastic matrix (rows are probability vectors), there is no concept of "scaling" beta in order to try and increase or decrease the amount of regularization.
Each row of beta is initialized randomly from a Dirichlet distribution.
I'm not sure there is a good reason for initializing beta to a particular value prior to training, since the main goal of topic models such as LDA or CTM is to learn the topics. You don't know the topics in advance, otherwise there would be no point in using the algorithm in the first place.
The only reason I know of to preset beta to a particular value would be if you had already trained a model and wanted transfer its topic distribution into your new model (warm start).
The situation with alpha is different. It makes sense to uniformly/symmetrically scale the alpha vector prior to training in order to try and control the amount of regularization. Otherwise however, the same reasoning applies, e.g.
alpha = [0.1,0.1] # low regularization alpha = [100,100] # high regularization alpha = [0.1,100] # doesn't make sense, since you don't know the topics in advance.
Because beta is a stochastic matrix (rows are probability vectors), there is no concept of "scaling" beta in order to try and increase or decrease the amount of regularization.
Maybe I wasn't correct. I am taking about \eta from this paper http://www.cs.columbia.edu/~blei/papers/BleiNgJordan2003.pdf figure 7, it determines words distributions
So eta is the regularizing hyperparameter used in smoothed LDA.
Unfortunately smoothed LDA has not been implemented in this package, although it is something I am considering adding in the near future.
For now, if you need a smoothed version, you may consider applying additive smoothing to beta after you train your model.
So eta is the regularizing hyperparameter used in smoothed LDA.
Unfortunately smoothed LDA has not been implemented in this package, although it is something I am considering adding in the near future.
For now, if you need a smoothed version, you may consider applying additive smoothing to beta after you train your model.
thank you so much
How to set beta parameter for words distributions?
I know how to set alpha, I have no idea how to set beta.
Thank you.