Open simsurace opened 2 years ago
Hi @simsurace,
Sorry, I did not answer (and save you some time) but I got Covid-stroke. Actually aux_posterior!
should not be differentiable! It should be ignored when doing the AD pass. When using Zygote, I pass the block in a Zygote.@ignore
, I don't know if it's possible to do the same with ForwardDiff though.
The reason is that the aux_posterior
step is already an implicit optimization.
Thanks, I did not notice that this is an implicit optimization. So is it independent of hyperparameters? If yes, this and the PR #77 would be unnecessary. I will give it a try and see if the results change! Sorry to hear about your illness. Hope you get better soon.
If it works out with the ignore statements, I could convert the PR into a documentation thing where this is explained.
Thanks, I did not notice that this is an implicit optimization. So is it independent of hyperparameters? If yes, this and the PR #77 would be unnecessary. I will give it a try and see if the results change! Sorry to hear about your illness. Hope you get better soon.
That's an interesting question actually it depends on the parametrization. Right now I am parametrizing with m and S, mean and covariance. But one could parameterize the covariance as (K^{-1} + D)^{-1} and similarly for the mean, there one could optimize the hyperparameters as well but that's a more complicated matter.
In summary, for full GPs, the kernel parameters only matters for the KL(q(f)||p(f)) and for sparse GPs they also are influenced in the expected log likelihood, but that's it.
Just to clarify my understanding:
The qΩ = aux_posterior(lik, y, qf)
should be ignored by the AD system, even though lik
and qf
depend on the parameters such as likelihood parameters, inducing point locations and variational parameters that one wants to optimize over?
Oh yeah sorry, somehow I got confused with the updates on q(f)
.
But it's the same thing. qΩ
is optimized via aux_posterior
and once this is obtained we can compute the ELBO and optimize the rest of the other hyper-parameters
EDIT: Ah I think I now understood, one should not expose the variational parameters to the optimizer, but have an internal CAVI loop for them.
Still struggling to make it work though. Do you have a working example for hyperparameter optimization of the augmented ELBO?
No hurry though. This is not very urgent, but it would be nice to make this work and compare it to the normal SVGP optimization loop for speed.
I tried to AD
aug_elbo
in theNegBinomialLikelihood
example, i.e. (removed unnecessary bits), purposefully avoiding ParameterHandling.jl and trying only withForwardDiff.gradient
There is an easy fix (happy to open a PR): change the definition of
aux_posterior
asBTW: is it expected that the values of the augmented ELBO are so much larger in magnitude than the normal ELBO?