Open dustinvtran opened 7 years ago
this is going to be implemented?
@naesseth is planning to although i think he's been busy of late (pinging him so he can give his own response)
Hi, yes sadly I have become tied up with other things. But the plan is still to implement it in Edward, most likely before the camera-ready deadline for AISTATS.
Oh great, that good news!, thank you.
The camera-ready deadline for AISTATS was a bit earlier than anticipated so it will not be ready in time for that. I have, however, provided code that we used for the experiments in the paper here
Hey @dustinvtran. Is this still up for grabs? Once finished with a work project, I'd like to give it a shot.
Yep.
Hey @dustinvtran. Getting ready to implement this.
It seems that the Reparameterization*KLqp
classes compute an estimate of the gradient of the ELBO via MC-integration w.r.t. the variational parameter z
, as opposed to $\mathcal{N}(\eta)$ as done in the paper. For instance, in build_reparam_loss_and_gradients.
For RSVI, we compute this expectation via integration w.r.t. the accepted variable (in the rejection-sampling step) $\epsilon$. Is this what I should strive for in the implementation? This would seem inconsistent with how you've implemented the above.
@cavaunpeu I'm unfamiliar with the specifics of Edward, but I think it should be possible to do the same for RSVI. The transformation I propose for the Gamma special case in my paper is invertible. In fact I make use of that in my Python/autograd implementation here. The inverse of the transformation is given by the function calc_epsilon.
Hey @naesseth! What do you mean by "do the same"? Integrate w.r.t. epsilon as you do in the paper?
@cavaunpeu you mentioned that in edward the expectation is computed wrt the latent variable z, whereas I in my paper focus on formulating the problem in epsilon. for the gamma rejection sampler reparameterization it is a straightforward change of variable from epsilon to z, so you could implement RSVI with expectations wrt z as well as epsilon.
@naesseth
Ah, got it. Thanks!
So, before I implement, I just want to make sure I understand everything clearly. Is the following accurate?
Θ
of our model, we compute the gradient of the ELBO w.r.t. Θ
, then update Θ
via SGD.∇_Θ h(Ɛ; Θ)
term therein — the gradient of our deterministic-mapping-function h
through which we generate samples from our (reparameterized) variational distribution q
— must by definition be differentiable w.r.t. Θ
. For many variational distributions, like Dirichlet
and Gamma
, this is not the case.Dirichlet
, Gamma
, etc. objects in Edward could, in theory, have samplers with a mapping function h
that is differentiable w.r.t. the variational parameters Θ
. However, they probably don't.Dirichlet
, Gamma
, and other distributions.Ɛ
. While the g_rep
term in this estimate contains ∇_Θ h(Ɛ; Θ)
, we took care to use a deterministic-mapping function h
that is indeed differentiable w.r.t. Θ
in the rejection sampler.z ~ q(z; Θ)
if h
is invertible, as z = h(Ɛ; Θ)
. Then, it's easy: just follow Algorithm 2.@cavaunpeu sounds about right. the requirement for using reparameterization-type gradients is slightly more subtle, but differentiability is a sufficient condition. if you'd like to know more about these issues it basically has to do with under what circumstances can we interchange integration and differentiation.
Ah, cool! Could you provided a reference? I'm interested to learn more.
@naesseth
So, I'm going to go ahead and implement this in two ways:
h
is not invertible (and a proposal distribution r
is necessarily provided).h
is invertible (as in the case of the Marsaglia and Tsang Gamma distribution).Please interject if this sounds wrong. Will continue to leave questions here. Thanks so much for the help thus far.
arxiv paper
looping in @naesseth