A question about language model priors

Hi. Thank you for such an exciting paper!

I would appreciate it greatly if you could shed some light on these:

The biggest question for me is how exactly we align an observed sequence in domain 1 (D1) to its corresponding latent sequence in domain 2 (D2). I guess the alignment is found through optimizing the KL regularizer (equation 3 in paper).
In the log_prior calculation, https://github.com/cindyxinyiwang/deep-latent-sequence-model/blob/8a798582b1af5ef7f6ac4ca1f2138fd382a1cb06/src/model.py#L339

the logprior is calculated as a combination of output of both LMs. Is there a reason you are calculating KL = E{x ~ q(z|x, y)}[log q(z|x, y) - log p(z|y)] instead of KL = E_{x ~ q(z|x)}[log q(z|x) - log p(z)], like in the paper?

When loading train data, is there a reason y is sampled with 1-y_train? : https://github.com/cindyxinyiwang/deep-latent-sequence-model/blob/8a798582b1af5ef7f6ac4ca1f2138fd382a1cb06/src/data_utils.py#L99
Is there a reason only latent y (and not x, from the other domain) is sampled in the data loading process? From equation 3 I presumed we sample from both domains to compare to each lm separately.

Thank you!

cindyxinyiwang / deep-latent-sequence-model

A question about language model priors #13