cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 557 forks source link

Natural Gradient Optimizer #894

Closed jejjohnson closed 4 years ago

jejjohnson commented 4 years ago

Hello,

I have seen in the literature for GP algorithms that use inducing points and variational inference to perform inference can sometimes converge slowly when only using optimizers such as stochastic gradient descent. This is apparently due to the difficulty of optimizing the variational parameters as well as the likelihood and kernel parameters. The authors in that paper suggest that using the natural gradient optimizer has shown to have big improvements especially in large-scale GP methods like the SVGP and DeepGP.

I've seen the NatGrad optimizer has already been implemented in this branch within the GPyTorch library. I wanted to know what was everyone's experience using it and why is it not in the main branch? Perhaps the nature of using matrix-vector-multiplication methods don't suffer from the same issue of convergence or joint optimization? Or it's just a matter of code coverage?

gpleiss commented 4 years ago

Hi @jejjohnson - the natural gradients branch was an experimental branch that we started a while ago. We are definitely interested in incorporating them into GPyTorch, though we'll probably have to make some significant updates to that branch :)

We have a number of updates to our variational models in the pipeline. I think this would be a useful thing to add to the list.