SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.01k stars 558 forks source link

Fast sampling from posterior. Fast (approximate?) posterior_samples_f #779

Closed vabor112 closed 4 years ago

vabor112 commented 5 years ago

Hello! I want to model a rather big sized problem with Gaussian Processes. 1. The process is multi-output with ~100 outputs. 2. I have somewhere around 50000 data points. 3. I want to generate samples of my multi-output GP on a grid of size ~500x500.

I know that SparseGPRegression should help with "2".

For "3" I know only posterior_samples_f and it won't work on a problem this big (it is O(N^3) for N=500*500*100>10^7 for me).

Are there any posterior sampling approximations that could help? I would really appreciate it if somebody could point to where something like this is implemeted in GPy or give links to the literature (papers) on the subject so I could implement this myself.

vabor112 commented 4 years ago

Me and my colleagues wrote a paper on this problem https://arxiv.org/abs/2002.09309 (to appear in the proceedings of ICML 2020). Though only the single-output case is discussed there, the approach generalizes easily to the multi-output case. The code for GPFlow is available here and the Julia package is available here. We haven't made the code for GPy, however it should be easy to develop given the existence of the GPFlow version. With this, I close the issue.