cindyxinyiwang / deep-latent-sequence-model

Pytorch implementation of "A Probabilistic Formulation of Unsupervised Text Style Transfer" by He. et. al. at ICLR 2020
163 stars 26 forks source link

A question about language model priors #13

Closed seongminp closed 3 years ago

seongminp commented 3 years ago

Hi. Thank you for such an exciting paper!

I would appreciate it greatly if you could shed some light on these:

  1. The biggest question for me is how exactly we align an observed sequence in domain 1 (D1) to its corresponding latent sequence in domain 2 (D2). I guess the alignment is found through optimizing the KL regularizer (equation 3 in paper).
    In the log_prior calculation, https://github.com/cindyxinyiwang/deep-latent-sequence-model/blob/8a798582b1af5ef7f6ac4ca1f2138fd382a1cb06/src/model.py#L339

the logprior is calculated as a combination of output of both LMs. Is there a reason you are calculating KL = E{x ~ q(z|x, y)}[log q(z|x, y) - log p(z|y)] instead of KL = E_{x ~ q(z|x)}[log q(z|x) - log p(z)], like in the paper?

  1. When loading train data, is there a reason y is sampled with 1-y_train? : https://github.com/cindyxinyiwang/deep-latent-sequence-model/blob/8a798582b1af5ef7f6ac4ca1f2138fd382a1cb06/src/data_utils.py#L99

  2. Is there a reason only latent y (and not x, from the other domain) is sampled in the data loading process? From equation 3 I presumed we sample from both domains to compare to each lm separately.

Thank you!

seongminp commented 3 years ago

duplicate of #14