cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.57k stars 560 forks source link

Is there a way to set the uncertainty of the observations for conditioning ? #890

Open Galto2000 opened 5 years ago

Galto2000 commented 5 years ago

Howdy folks,

Say we trained a GP on data with some known standard deviation d1. Then we deploy this GP in the world, but with sensors that might be noisier that the one from which the training data was obtained: d2 > d1.

I am using model.set_train_data(obs_x, obs_y) for performing the conditioning on the (noisier) observations, and then I get my predictive distribution as follows:

    with torch.no_grad(), gpytorch.settings.fast_pred_var():
        test_x = torch.linspace(1000, 0, 100)
        predictions = likelihood(model(test_x))

Where do I need to account for the noise of the observations (d2)? Do I set the likelihood to d2?

Thanks in advance

Galto

jacobrgardner commented 5 years ago

Yeah, the likelihood noise would correspond to the known observation noise.

You could also just do pred = model(test_x) and then real_variance = pred.variance + d2 ** 2, since that's all tthe likelihood is really doing in the regression setting.

Galto2000 commented 5 years ago

Thanks @jacobrgardner

How would I go about two, or more, sets of observations, each with different noise: say we have different sensors measuring the same property, but at different accuracies. Then I would like to condition my model on all observation sets in order to get a predictive posterior distribution, in a sensor-fusion kinda of way?

jacobrgardner commented 5 years ago

You probably want to just write a new likelihood for that. If your data is sorted reasonably so like [All data from source one; then all data from source 2], you basically want something like the fixed noise likelihood that adds DiagLazyTensor([d1, d1, ..., d1, d2, d2, ..., d2]) to the covariance matrix.

Galto2000 commented 5 years ago

I have pretty good control over the sensors, so it can be done that way.

Thanks for the advice Jacob!

Galto2000 commented 5 years ago

How would I go about doing the same, but for a multi-task GP (say two outputs)?

Galto2000 commented 5 years ago

Hi @jacobrgardner

I think I am missing something or misunderstood your explanation. The way I am interpreting your explanation is that the measurement noise is added after the prediction (i.e. conditioning) has been done. But I always understood that conditioning (with noise) is done by replacing Kxx with Kxx + sigma^2*I in the equations for the predictive mean and predictive covariance. So somehow the measurement noise have to get "inserted" before doing the prediction, no?

GP Equations Rasmussen

Your explanation about organizing the data made perfect sense to me, and so if I wanted to do multi-sensor fusion/prediction , for n sensors each with a sigma (sigma_1^2, sigma_2^2 .. sigma_n^2) then I would need to do something like this:

GP Sensor Fustion 2

I am not understanding how adding DiagLazyTensor([d1, d1, ..., d1, d2, d2, ..., d2] to the prediction is the equivalent.

On top of all that, I want to do this over multi-task GP regression, since each sensor is measuring various properties ( all at the same locations, but with different sensors and thus different noises), like temperature,pressure, % water vapour, etc.

Galto2000 commented 5 years ago

I am still confused about this :)

So I have used some data (X_train, Y_train) for solving the hyper parameters (i.e. for training).

This gives me the hyper parameters for my kernel function K(.,.) .

Now I want to 'deploy' kernel K(.,.) in a real-life application, and use it on new and unseen data (X1,Y1) to make predictions Y* over new locations X*.

This new data comes from sensors other than the one that was used for producing the data for solving the hyper parameters (i.e. for training) with, and so will have "unseen noise" VAR_Y1.

So, all in all, I want to implement something like the following:

1) Train my kernel on some data (I only do this once):

gp_trainer.set_training_data(X_train, Y_train)
gp_trainer.save_kernel_to_file('my_kernel.pt')

2) Deploy my kernel K (on some embedded microcontroller on a drone for example) that I have stored in a pickle file:

gp_prior.load_kernel('my_kernel.pt')

3) compute the predictive distribution based on new observations:

gp_posterior = gp_prior.condition_on(X1, Y1, Var_Y1)

4) then make predictions:

Y* = gp_posterior.make_prediction(X*)

Given the equations above from Rasmussen's book (http://www.gaussianprocess.org/gpml/chapters/RW.pdf), the uncertainties from the online observations need to be included for the computation of the predictive distribution (see equations 2.20 through 2.24, "Predictions using Noisy Observations").

It's currently not obvious to me how adding these noises after having computed the predictive distribution by applying a likelihood to it,

likelihood(model(test_x))

is equivalent to the operation described over equations 2.20 through 2.24 in Rasmussen's text (or the equations in the previous post - in case of conditioning on multiple observations from different sensors with different noises).

I haven't really dug deep in the GpyTorch code, so I assume that somehow the noise is "extracted" from the likelihood that was passed as argument when instantiating an ExactGP object, and then applied to the predictive distribution computation?