Closed leungi closed 5 years ago
I would actually presume that LDA$topic_word_distribution
gives phi
. That's calculated along the way for most implementations, including text2vec
's WarpLDA. It takes post-processing (basically Bayes' Rule) to get gamma
.
LDA$components
gives you unnormalized topic-word counts. You can calculate P(token|topic) or P(topic|token) just by normalizing this matrix by row or column (making each row or column have unit L1 norm - simply divide each element in row/column by sum(row/column)).
Thank you both for the prompt assistance!
@TommyJones: you're right :+1:
@dselivanov: hope I understand you right... P(token|topic) = LDA$topic_word_distribution = normalize(LDA$components, 'l1') P(topic|token) = normalize(t(LDA$components), 'l1')
When I compared the LDA output (using 1st 500 rows from movie_review
data) of text2vec
and topicmodels
packages, I noticed significant difference (with seed fixed).
For the word wild, topicmodels
assigned relatively significant to 6 topics, while text2vec
only assigned it to 2 topics. I presume this is due to varying implementations of LDA?
P(token|topic) = LDA$topic_word_distribution = normalize(LDA$components, 'l1') P(topic|token) = normalize(t(LDA$components), 'l1')
yes.
When I compared the LDA output (using 1st 500 rows from movie_review data) of text2vec and topicmodels packages, I noticed significant difference (with seed fixed).
Not sure about the difference. Try mb underfitting/overfitting/different hype-parameters(priors). Try to check perplexity of both models.
This is a question on the LDA implementation.
In
textmineR
package, the LDA model outputstheta
P(topic|document),phi
P(token|topic), andgamma
P(topic|token).In
text2vec
,theta
can be obtained fromLDA$fit_transform()
.Question: I presume
LDA$topic_word_distribution
givegamma
; is it possible to extractphi
- P(token|topic)?Thanks!