google-research / torchsde

Differentiable SDE solvers with GPU support and efficient sensitivity analysis.
Apache License 2.0
1.51k stars 194 forks source link

Clarification on logqp_path #82

Closed JurijsNazarovs closed 3 years ago

JurijsNazarovs commented 3 years ago

Hello,

Thanks for the work, it looks great. However, I have a question about logqp_path and its involvement in KL term in latent_sde.py example. I have read the paper "Neural SDE: Stabilizing Neural ODE ..." and went through your code, but it is still not clear why logqp_path is a part of KL term in addition to logqp0 and why we compute it as 1/2*((f-h)/g)^2. It looks like log of Gaussian pdf, but I did not make a connection. I would appreciate if you can clarify how to derive logqp_path.

Jurijs

lxuechen commented 3 years ago

Hi Jurijs,

Thanks for the interest. The other term you're mentioning (the term is actually an expected integral over 1/2*((f-h)/g)^2, as this term on its own doesn't have much meaning) is in fact a Monte Carlo estimator of a KL divergence on path space. There are two SDEs involved here: One we call the prior, and the other we call the approx. posterior.

Each of the two SDEs induce their own solution, each of which in turn is a stochastic process and defines a distribution over the space of functions (e.g. C([0, 1], R^d)). With these two distributions, we can now define a KL divergence, and estimate it using Monte Carlo. This procedure results in that term.

Obviously, what I'm claiming here simplifies some technical aspects, and I'd recommend our paper section 9.6 (and the relevant section in the main text) for a detailed derivation. The neural SDE paper by Tzen & Raginsky also has relevant information. The chapter on Girsanov's theorem/likelihood-ratio of Ito process in the Applied SDE book also has relevent information, though the derivation is largely heuristic and technically a bit off on the math details. The derivation there is based on viewing SDEs as the limit of incrementing scaled Gaussians, so it's by the far the most approachable.

lxuechen commented 3 years ago

Closing this now. Feel free to reopen if there are additional questions.

JurijsNazarovs commented 3 years ago

Hi Xuechen,

Thanks for the response, it was very helpful. I have 2 additional questions regarding your paper/code. It would be wonderful, if you reply. 1) In section 9.6.1 the derivation of ELBO ends up with expected value w.r.t. P-law, which is law of prior. However, in the section 5 and in the code, the expected value is computed w.r.t. approximate posterior distribution Q.
2) Despite the question 1) I understand the derivation of ELBO and appearance of loop_path in the code. However, it is not clear for me where is the standard KL term between prior and posterior in the ELBO, notated as loqp0 in the code.

Jurijs

lxuechen commented 3 years ago

In section 9.6.1 the derivation of ELBO ends up with expected value w.r.t. P-law, which is law of prior. However, in the section 5 and in the code, the expected value is computed w.r.t. approximate posterior distribution Q.

P and Q in our paper are prob. measures on the underlying prob. space. With this measure, W_t is a standard Wiener process (whereas hat{W_t} is not). It is not the law/distribution induced by the stochastic process (we call this the P-law of the stoch. process, if P is the prob. measure; similarly, if Q is the prob. measure, then the law induced by the stoch. process is called its Q-law). Clearly, if you change the underlying law on the prob. space to something other than P, W_t may not be a standard Wiener process anymore, and hence the distribution on path space induced by the SDE would also change.

So under P, W_t is a standard Wiener process, therefore we could simulate the SDE with h_phi and sigma using sdeint.

Despite the question 1) I understand the derivation of ELBO and appearance of loop_path in the code. However, it is not clear for me where is the standard KL term between prior and posterior in the ELBO, notated as loqp0 in the code.

KL at the time=0 was omitted in the paper to save space.

JurijsNazarovs commented 3 years ago

Ok, everything is clear now. Thank you very much for your time.

Have a good day, Jurijs