blei-lab / edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
http://edwardlib.org
Other
4.83k stars 759 forks source link

rejection sampling variational inference #379

Open dustinvtran opened 7 years ago

dustinvtran commented 7 years ago

arxiv paper

looping in @naesseth

jumpynitro commented 7 years ago

this is going to be implemented?

dustinvtran commented 7 years ago

@naesseth is planning to although i think he's been busy of late (pinging him so he can give his own response)

naesseth commented 7 years ago

Hi, yes sadly I have become tied up with other things. But the plan is still to implement it in Edward, most likely before the camera-ready deadline for AISTATS.

jumpynitro commented 7 years ago

Oh great, that good news!, thank you.

naesseth commented 7 years ago

The camera-ready deadline for AISTATS was a bit earlier than anticipated so it will not be ready in time for that. I have, however, provided code that we used for the experiments in the paper here

cavaunpeu commented 7 years ago

Hey @dustinvtran. Is this still up for grabs? Once finished with a work project, I'd like to give it a shot.

dustinvtran commented 7 years ago

Yep.

cavaunpeu commented 6 years ago

Hey @dustinvtran. Getting ready to implement this.

It seems that the Reparameterization*KLqp classes compute an estimate of the gradient of the ELBO via MC-integration w.r.t. the variational parameter z, as opposed to $\mathcal{N}(\eta)$ as done in the paper. For instance, in build_reparam_loss_and_gradients.

For RSVI, we compute this expectation via integration w.r.t. the accepted variable (in the rejection-sampling step) $\epsilon$. Is this what I should strive for in the implementation? This would seem inconsistent with how you've implemented the above.

naesseth commented 6 years ago

@cavaunpeu I'm unfamiliar with the specifics of Edward, but I think it should be possible to do the same for RSVI. The transformation I propose for the Gamma special case in my paper is invertible. In fact I make use of that in my Python/autograd implementation here. The inverse of the transformation is given by the function calc_epsilon.

cavaunpeu commented 6 years ago

Hey @naesseth! What do you mean by "do the same"? Integrate w.r.t. epsilon as you do in the paper?

naesseth commented 6 years ago

@cavaunpeu you mentioned that in edward the expectation is computed wrt the latent variable z, whereas I in my paper focus on formulating the problem in epsilon. for the gamma rejection sampler reparameterization it is a straightforward change of variable from epsilon to z, so you could implement RSVI with expectations wrt z as well as epsilon.

cavaunpeu commented 6 years ago

@naesseth

Ah, got it. Thanks!

So, before I implement, I just want to make sure I understand everything clearly. Is the following accurate?

naesseth commented 6 years ago

@cavaunpeu sounds about right. the requirement for using reparameterization-type gradients is slightly more subtle, but differentiability is a sufficient condition. if you'd like to know more about these issues it basically has to do with under what circumstances can we interchange integration and differentiation.

cavaunpeu commented 6 years ago

Ah, cool! Could you provided a reference? I'm interested to learn more.

cavaunpeu commented 6 years ago

@naesseth

So, I'm going to go ahead and implement this in two ways:

  1. Equation 7, for when h is not invertible (and a proposal distribution r is necessarily provided).
  2. Equation 8, for when h is invertible (as in the case of the Marsaglia and Tsang Gamma distribution).

Please interject if this sounds wrong. Will continue to leave questions here. Thanks so much for the help thus far.

naesseth commented 6 years ago

@cavaunpeu Sounds good, I'll try to answer any follow up questions as soon as I can.

A reference can be found e.g. here, hower it seems planetmath is down at the moment.