Closed jejjohnson closed 4 years ago
Hi @jejjohnson - the natural gradients branch was an experimental branch that we started a while ago. We are definitely interested in incorporating them into GPyTorch, though we'll probably have to make some significant updates to that branch :)
We have a number of updates to our variational models in the pipeline. I think this would be a useful thing to add to the list.
Hello,
I have seen in the literature for GP algorithms that use inducing points and variational inference to perform inference can sometimes converge slowly when only using optimizers such as stochastic gradient descent. This is apparently due to the difficulty of optimizing the variational parameters as well as the likelihood and kernel parameters. The authors in that paper suggest that using the natural gradient optimizer has shown to have big improvements especially in large-scale GP methods like the SVGP and DeepGP.
I've seen the NatGrad optimizer has already been implemented in this branch within the GPyTorch library. I wanted to know what was everyone's experience using it and why is it not in the main branch? Perhaps the nature of using matrix-vector-multiplication methods don't suffer from the same issue of convergence or joint optimization? Or it's just a matter of code coverage?