Document-Topic Memberships

dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Other

850 stars 135 forks source link

I don't see the issue with first case - it looks like according to the model document belongs to 4 topics with equal proportions. If it is not the case according to human judgement it means you may need to try different hyper-parameters. The second case happen because as you mentioned probabilities normalized to 1. This is the case when doc-topic prior helps. By default transform and fit_transform don't add prior (which is not correct according to the LDA model definition, but gives much more sparse doc-topic assignments and works good in practice). So if your texts are short you may be interested to add priors (this will make model less confident about topic assignments, essentially this is regularization). Check code here - https://github.com/dselivanov/text2vec/blob/c3196d8655709c20d82f9946cf8a041d1c7f5364/R/model_LDA.R#L32-L34

dselivanov / text2vec

Document-Topic Memberships #267