VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.47k stars 1.93k forks source link

Document how to aquire phi and theta from a LDA topic model #2780

Open hans-ekbrand opened 3 years ago

hans-ekbrand commented 3 years ago

Description

For evaluating and understanding a LDA topic model, phi and theta are essential. The documentation really should provide information on how these can be aquired.

Link to Documentation Page

https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Latent-Dirichlet-Allocation

In addition to say something about how phi and theta can be aquired, or approximated, the documentation about what the numbers in the human readable model represents can be improved, for now it says "columns 2-n represent the per-word topic distributions". I think be to useful these numbers need to be normalised, I guess so that each row sums to 1, in which case the number represents p(t|w), ie. the probability of the topic, when the word is given.

Another statistic one need to understand a topic model is p(w|t), ie the probability of the term given the topic. I believe this is phi.

https://stackoverflow.com/questions/65727712/can-ldavis-analyse-the-results-of-vowpal-wabbit-lda is essentially about this problem too.

olgavrou commented 3 years ago

Looking