adjidieng / ETM

Topic Modeling in Embedding Spaces
MIT License
538 stars 126 forks source link

Any suggestion on multiple topics for one document? #5

Open WalterKung opened 4 years ago

WalterKung commented 4 years ago

Thank you for your work on ETM model. I applied my documents using ETM. ETM gave clearer cut topics than LDA did.

The original LDA could have multiple topics assign to a single document. In the paper, you are using softmax for theta - topic embedding. The softmax tend to assign one topic for one document. I am wondering if you can give me some suggestion on how I can use ETM to get multiple topics from a single document. I am using get_theta(normalized_data_batch) to get the topic distribution.

https://github.com/WalterKung/DataConference2020/blob/master/P2_TOPIC_MODEL/SS_TOPIC_MODEL_Stock_by_news.ipynb

dubbsbrandon commented 4 years ago

Is there any chance that you could explain the parameters? I'm having a bit of trouble using them properly. An example would be really helpful.

gokceneraslan commented 4 years ago

Could this be related to isotropic Gaussian prior over theta logits (as in typical VAEs)?

qixiang109 commented 4 years ago

guess as the log-normal's natural, smaller sigma on the Gaussian prior would give your smoother topic proportions. However this implement seems do not allow for a configurable sigma——it is hard-coded the Gaussian prior to (mu=0,sigma=1) in the encoder. Could you change that and report here later?

ydennisy commented 3 years ago

@WalterKung you metioned you are using get_theta(normalized_data_batch) as the way to get the topic dist - is this the correct way?

There are quite a few questions in this repo on how to predict on new data: https://github.com/adjidieng/ETM/issues/4

Thanks in advance!