cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.57k stars 562 forks source link

Support for temporary/fantasy training data #177

Closed jacobrgardner closed 5 years ago

jacobrgardner commented 6 years ago

This is related to the need for "fantasy" observations for BayesOpt, where, in addition to training data, we want to condition a model on extra temporary training data with sampled function values as labels.

Right now, this is technically supported via the set_train_data method, which can make arbitrary changes to the training data with strict=False, which would let us just directly append the fantasy data to the training data: https://github.com/cornellius-gp/gpytorch/blob/c16ec465ee6e2d9d1a6aa60e303a1292439ea275/gpytorch/models/exact_gp.py#L50

If we'd prefer a better interface that doesn't involve the user tracking how many fantasy points they've added so they can remove them later, we could add a similar method (set_fantasy_data ?) that modifies self.fantasy_inputs and self.fantasy_targets attributes (default None) similar to the train_inputs and train_targets ones modified there.

If we did this, then we'd basically need to update __call__ in three ways.

First, we concatenate in fantasy_inputs here: https://github.com/cornellius-gp/gpytorch/blob/c16ec465ee6e2d9d1a6aa60e303a1292439ea275/gpytorch/models/exact_gp.py#L92-L95

Then concatenate fantasy_targets on to fantasy_labels here: https://github.com/cornellius-gp/gpytorch/blob/c16ec465ee6e2d9d1a6aa60e303a1292439ea275/gpytorch/models/exact_gp.py#L104

And update the n_train argument to account for the fantasy training data here: https://github.com/cornellius-gp/gpytorch/blob/c16ec465ee6e2d9d1a6aa60e303a1292439ea275/gpytorch/models/exact_gp.py#L110

Balandat commented 6 years ago

Is there a way to do this efficiently without resetting the mean/covar caches? Seems like re-computing the full kernel matrix would be quite expensive if all we do is modify the data by adding a small number of fantasies.

jacobrgardner commented 6 years ago

So dealing with the mean caches is basically a case of dealing with linear systems involving bordered matrices, because we basically want to update K^{-1}y to [K A; B C]^{-1}[y; y_fantasy]. Methods exist for this (e.g., https://www.researchgate.net/publication/307559841_Linear_systems_of_equations_with_bordered_matrices, a few discussed in Golub & Van Loan) that may or may not be faster in actual wall clock time than just doing the solve from scratch -- our single solves are pretty fast at this point if you have a GPU.

The covar cache is only used/computed with LOVE, which would take some thought on how to update. This is actually something that @andrewgordonwilson and I are actively researching: how to update LOVE in the setting where you add individual data points. I have some ideas about this, but they are a bit complicated for a github issue.

jacobrgardner commented 6 years ago

For what it's worth, doing the solve from scratch using CG has the same asymptotic complexity for exact GPs (O(n^2)) as the "standard" way you'd do this update in a Cholesky-based GP package, which involves using the Schur complement and Woodbury formula.

Balandat commented 6 years ago

may or may not be faster in actual wall clock time than just doing the solve from scratch -- our single solves are pretty fast at this point if you have a GPU.

Being smart about warm-starting should probably be very helpful here as well, right?. E.g. the initial guess could take the solution from the previous solve for the existing points, and sth. ad-hoc like the mean across the solution for the previous points.

jacobrgardner commented 6 years ago

Good point, initializing with the existing mean cache with a few extra zeros concatenated for the fantasy examples is a smart idea. Assuming we don't expect the training data to ever change radically, what do you think about making it default behavior to, if a mean cache already exists, expand it to match the training data size and use it as initialization always?

Balandat commented 6 years ago

That sounds good to me. Would you want to use zeros or the mean across the mean cache?

jacobrgardner commented 6 years ago

Alright, I gave this some more thought and for temporary fantasy points and exact GPs specifically there is an O(kn+m^2) time approximate solution (where m is the number of fantasy points) for updating the mean cache and covar cache IF we are already using LOVE, where k is the rank of the decomposition used for LOVE.

I'll implement this idea as the default behavior when gpytorch.settings.fast_pred_var() is on, since the approximation should be exactly as good as the LOVE variances anyways, and do the initialization stuff we talked about when it is off.

There will be some kinda involved internal changes with this, so I can either get started on this now or continue with the original plan of helping finish up priors first.

cc @andrewgordonwilson, since the trick I'm talking about here is highly relevant to our discussion about updating the precomputed cache for LOVE.

Balandat commented 6 years ago

Let's try to get the priors in first, so we can avoid working on diverging branches as much as possible