ericproffitt / TopicModelsVB.jl

A Julia package for variational Bayesian topic modeling.
Other
81 stars 8 forks source link

why everywhere in code is model.phi[1] and not model.phi[d] #32

Closed ValeriiBaidin closed 4 years ago

ValeriiBaidin commented 4 years ago

Hi. I am sorry to bother you. I am wondering, why everywhere in code is model.phi[1] and not model.phi[d]. Is it correct?

ericproffitt commented 4 years ago

Hi Valerii,

In order to significantly reduce memory overhead, I decided not to store each document's phi parameter. Thus model.phi is overwritten for each document.

If you want to recompute phi for a particular document d, you can run,

TopicModelsVB.update_phi!(model, d)

Then model.phi[1] will be the phi corresponding to document d.

ValeriiBaidin commented 4 years ago

Hi Valerii,

In order to significantly reduce memory overhead, I decided not to store each document's phi parameter. Thus model.phi is overwritten for each document.

If you want to recompute phi for a particular document d, you can run,

TopicModelsVB.update_phi!(model, d)

Then model.phi[1] will be the phi corresponding to document d.

Oh. yes, I see it. It's just because, you didn't need to keep it for each document.

SO why in this case, do you have vector of Phi. Is it just from the previous version?

ericproffitt commented 4 years ago

I forget the exact reason why I kept it as a vector, but basically it led to a more elegant code architecture.

It's just because, you didn't need to keep it for each document.

Correct.

Strictly speaking, for the LDA model, the only parameters that must be kept are alpha, beta, and Elogtheta, all other parameters can be regenerated from those three.

ValeriiBaidin commented 4 years ago

So I worry, how phi[1] is matched length for all documents?

ericproffitt commented 4 years ago

It changes in size for different documents.

ValeriiBaidin commented 4 years ago

It changes in size for different documents.

model.phi[1] = model.beta[:,terms] .* exp.(model.Elogtheta[d])

so if length(d) > length phi[1]? since size(terms) is from document D

ericproffitt commented 4 years ago

I'm not sure I understand your question.

phi is just a vector which contains a single element of matrix type, that element is switched out for each document.

so if you have 4 topics, and length(doc1)=10, then

terms = model.corp[1].terms
model.phi[1] = model.beta[:,terms] .* exp.(model.Elogtheta[1])
model.phi[1] ./= sum(model.phi[1], dims=1)

model.phi[1] will be a matrix of size 4 x 10.

Then if doc2 has 20 terms, then

terms = model.corp[2].terms
model.phi[1] = model.beta[:,terms] .* exp.(model.Elogtheta[2])
model.phi[1] ./= sum(model.phi[1], dims=1)

model.phi[1] will be a matrix of size 4 x 20.

ValeriiBaidin commented 4 years ago

Oh, I am sorry. I think the size is fixed. thank you