Document how to aquire phi and theta from a LDA topic model

Description

For evaluating and understanding a LDA topic model, phi and theta are essential. The documentation really should provide information on how these can be aquired.

Link to Documentation Page

https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Latent-Dirichlet-Allocation

In addition to say something about how phi and theta can be aquired, or approximated, the documentation about what the numbers in the human readable model represents can be improved, for now it says "columns 2-n represent the per-word topic distributions". I think be to useful these numbers need to be normalised, I guess so that each row sums to 1, in which case the number represents p(t|w), ie. the probability of the topic, when the word is given.

Another statistic one need to understand a topic model is p(w|t), ie the probability of the term given the topic. I believe this is phi.

https://stackoverflow.com/questions/65727712/can-ldavis-analyse-the-results-of-vowpal-wabbit-lda is essentially about this problem too.

VowpalWabbit / vowpal_wabbit

Document how to aquire phi and theta from a LDA topic model #2780

Description

Link to Documentation Page