dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
850 stars 135 forks source link

LDA : is it possible to set priors on word count for some topics ? #277

Closed dominiqueemmanuel closed 4 years ago

dominiqueemmanuel commented 6 years ago

Hello,

This just a "feature request" issue.

I'am wondering if it is possible, in LDA topic modeling implemented in text2vec, to set priors on word count of few topics.

For exemple, let's suppose we want to extract 50 topics, but we know (as a prior information), that topic 1 si composed (approximatly) of few words we already know, w1,...,w5 for instance. And topic 2 by w6,...w12.

Is is possible to initialise the topic_word_count (in LatentDirichletAllocation ?) before it start to fit the model?

Best regards, Dom

dselivanov commented 4 years ago

If someone need - PR welcome. I'm not going to work on that.