cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.56k stars 559 forks source link

Using SVGPs with correlated input uncertainties #1357

Open cisprague opened 3 years ago

cisprague commented 3 years ago

I am trying to model a large-scale (>1M points) 2D field with an SVGP f: (x,y) -> z, where each input (x,y)_i has it's own distribution N(0, Σ_i). Further, the input distributions are correlated with each other since they come from vehicle dead-reckoning. In this case, would there be an issue with using the input distribution sampling method shown in #913? Are there other SVGP-compatible methods that could accommodate this particular case? Could you recommend some things to look at in the literature for this? @gpleiss @jejjohnson, perhaps you guys have experience with this? Thank you!

jacobrgardner commented 3 years ago

Are all input distributions correlated with all other input distributions? If so, you may have to ignore these cross input correlations for minibatch training to be technically correct.

If you do that, you can probably get away with the instance sampling from that example, or using a distributional kernel like here: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/kernels/distributional_input_kernel.py. Basically the distributional kernel lets you pack the full distribution into the last dimension of x however you'd like, as long as you specify a distance function that computes a distribution distance between pairs of distributions.

ignaciotb commented 3 years ago

Thanks for the answer @jacobrgardner. All our inputs are measurements taken from a mobile robot, and so they're all correlated through the uncertainty in the robot localization. Would it be possible to use a "kernel mean embedding" kernel with the SVGP to take care of that?

cisprague commented 3 years ago

@jacobrgardner

Are all input distributions correlated with all other input distributions?

Indeed, the input distributions are correlated through a Kálmán filter and a sensor measurement model. A valid approach we see is in here, where they approximate a kernel over correlated distributions (Eq. 16) with a quadrature in Sec. 3.6.1. However, it's not a variational approach and doesn't scale.

We're also considering using the variational GP-LVM approach, so we can also learn the posterior over the inputs (instead of only the function), but it's not clear to us how to take into account the input correlations.