ericproffitt / TopicModelsVB.jl

A Julia package for variational Bayesian topic modeling.
Other
81 stars 8 forks source link

How to set beta parameter for words distributions? #36

Closed ValeriiBaidin closed 4 years ago

ValeriiBaidin commented 4 years ago

How to set beta parameter for words distributions?

I know how to set alpha, I have no idea how to set beta.

Thank you.

ericproffitt commented 4 years ago

beta is the topic distribution matrix, it's of dimension K x V. Each row is the corresponding topic's discrete probability distribution over the vocabulary.

Generally speaking, beta is what is learned when you train your model on your corpus.

ValeriiBaidin commented 4 years ago

beta is the topic distribution matrix, it's of dimension K x V. Each row is the corresponding topic's discrete probability distribution over the vocabulary.

Generally speaking, beta is what is learned when you train your model on your corpus.

Yes, I know it.

But other LDA packages ask about alpha and beta, as initial parameters.

I think, there is the Dirichlet distributions documents and topics with apartments alpha. And the other dirichlet distributions for words and topics with parameters beta. isn't it?

ericproffitt commented 4 years ago

Each row of beta is initialized randomly from a Dirichlet distribution.

I'm not sure there is a good reason for initializing beta to a particular value prior to training, since the main goal of topic models such as LDA or CTM is to learn the topics. You don't know the topics in advance, otherwise there would be no point in using the algorithm in the first place.

The only reason I know of to preset beta to a particular value would be if you had already trained a model and wanted transfer its topic distribution into your new model (warm start).

The situation with alpha is different. It makes sense to uniformly/symmetrically scale the alpha vector prior to training in order to try and control the amount of regularization. Otherwise however, the same reasoning applies, e.g.

alpha = [0.1,0.1] # low regularization

alpha = [100,100] # high regularization

alpha = [0.1,100] # doesn't make sense, since you don't know the topics in advance.

Because beta is a stochastic matrix (rows are probability vectors), there is no concept of "scaling" beta in order to try and increase or decrease the amount of regularization.

ValeriiBaidin commented 4 years ago

Each row of beta is initialized randomly from a Dirichlet distribution.

I'm not sure there is a good reason for initializing beta to a particular value prior to training, since the main goal of topic models such as LDA or CTM is to learn the topics. You don't know the topics in advance, otherwise there would be no point in using the algorithm in the first place.

The only reason I know of to preset beta to a particular value would be if you had already trained a model and wanted transfer its topic distribution into your new model (warm start).

The situation with alpha is different. It makes sense to uniformly/symmetrically scale the alpha vector prior to training in order to try and control the amount of regularization. Otherwise however, the same reasoning applies, e.g.

alpha = [0.1,0.1] # low regularization

alpha = [100,100] # high regularization

alpha = [0.1,100] # doesn't make sense, since you don't know the topics in advance.

Because beta is a stochastic matrix (rows are probability vectors), there is no concept of "scaling" beta in order to try and increase or decrease the amount of regularization.

Maybe I wasn't correct. I am taking about \eta from this paper http://www.cs.columbia.edu/~blei/papers/BleiNgJordan2003.pdf figure 7, it determines words distributions

ericproffitt commented 4 years ago

So eta is the regularizing hyperparameter used in smoothed LDA.

Unfortunately smoothed LDA has not been implemented in this package, although it is something I am considering adding in the near future.

For now, if you need a smoothed version, you may consider applying additive smoothing to beta after you train your model.

ValeriiBaidin commented 4 years ago

So eta is the regularizing hyperparameter used in smoothed LDA.

Unfortunately smoothed LDA has not been implemented in this package, although it is something I am considering adding in the near future.

For now, if you need a smoothed version, you may consider applying additive smoothing to beta after you train your model.

thank you so much