Thanks for the interesting paper and great repository. I have a few clarification questions regarding the method and the code that I was wondering if you could help me with. Thanks in advance!
In Section 4.2 of the (arXiv version) paper, it states that
"We choose a Gaussian distribution q(\eta_t | \eta_{1:t-1}, \tilde{w}_t), whose mean and covariance are given by the output of the LSTM."
However, in this repository, the LSTM takes in only \tilde{w}_t as input, but not \eta_{1:t-1}
(https://github.com/adjidieng/DETM/blob/master/detm.py#L130)
Rather, \eta_{t-1} is only used AFTER LSTM (https://github.com/adjidieng/DETM/blob/master/detm.py#L146) through concatenation with the LSTM output. In this way, the LSTM can only capture the temporal dependency of \tilde{w}, but not the temporal dependency of \eta. I probably missed something, but I wonder if you could please help me understand the intuition behind this. Thank you.
In D-LDA (Dynamic Topic Models, Blei & Lafferty 2006) paper, the method is able to perform "future" prediction (Fig 5 in the D-LDA paper). On the other hand, with DETM, I wonder if the dependency of \tilde{w}_t in q(\eta_t | \eta_{1:t-1}, \tilde{w}_t) disables DETM from doing future prediction, since it uses "words from the future time step" (\tilde{w}_t).
Thanks for the interesting paper and great repository. I have a few clarification questions regarding the method and the code that I was wondering if you could help me with. Thanks in advance!
In Section 4.2 of the (arXiv version) paper, it states that
"We choose a Gaussian distribution q(\eta_t | \eta_{1:t-1}, \tilde{w}_t), whose mean and covariance are given by the output of the LSTM."
However, in this repository, the LSTM takes in only\tilde{w}_t
as input, but not\eta_{1:t-1}
(https://github.com/adjidieng/DETM/blob/master/detm.py#L130) Rather,\eta_{t-1}
is only used AFTER LSTM (https://github.com/adjidieng/DETM/blob/master/detm.py#L146) through concatenation with the LSTM output. In this way, the LSTM can only capture the temporal dependency of\tilde{w}
, but not the temporal dependency of\eta
. I probably missed something, but I wonder if you could please help me understand the intuition behind this. Thank you.In D-LDA (Dynamic Topic Models, Blei & Lafferty 2006) paper, the method is able to perform "future" prediction (Fig 5 in the D-LDA paper). On the other hand, with DETM, I wonder if the dependency of
\tilde{w}_t
inq(\eta_t | \eta_{1:t-1}, \tilde{w}_t)
disables DETM from doing future prediction, since it uses "words from the future time step" (\tilde{w}_t
).Thank you!