OxfordML / GPz

GPz 2.0: Heteroscedastic Gaussian processes for uncertain and incomplete data
45 stars 10 forks source link

Sampling the GP model w/ heteroscedastic noise #1

Closed agerlach closed 8 years ago

agerlach commented 8 years ago

I am trying to modify your demo_sinc.py code to test it for an engineering problem I am working on and I have a few questions.

  1. What exactly is going on on lines 43-53? Is this code drawing a random sample from the distribution?
  2. If so, it appears that it is sampling using the model variance only. How could this be modified to sample using both the model and heteroscedastic noise variance?

Ultimately, I would like to be able to use GPz to draw random measurements of the underlying process for usage in Monte Carlo simulations.

Thanks and great work, I found your paper very interesting.

csibrahim commented 8 years ago

Thank you @agerlach for the question,

  1. This part of the code draws samples from the posterior distribution of the parameters 'w' not the function.
  2. To sample from the function with heteroscedastic noise you have to sample from the predictive distribution of the function which has the following mean and covariance
mu = PHI*w+X*wL+muY
C = PHI*SIGMAi*PHI'+diag(beta_i)

I am doing some preprocessing to the output by extracting its mean 'muY' and 'X*wL' where 'wL' minimizes

||X*wL-(Y-muY)||^2

So that GPz learns the non-linear deviation from the best linear regression model.

The covariance can be approximated by its diagonal which can be computed simply by adding 'nu' and 'beta_i'.

To draw 20 samples in matlab:

SIGMAi = model.best.SIGMAi;

[mu,sigma,nu,beta_i,PHI] = predict(Xs,model);

C = PHI*SIGMAi*PHI'+diag(beta_i); % or approximate by diag(nu+beta_i)=diag(sigma)

[U,S] = svd(C);

R = U*sqrt(S);

mus = bsxfun(@plus,R*randn(length(mu),20),mu);
plot(Xs,mus);
csibrahim commented 8 years ago

In case of diagonal approximation the following is much faster:

[mu,sigma] = predict(Xs,model);
mus = bsxfun(@plus,diag(sqrt(sigma))*randn(length(mu),20),mu);
plot(Xs,mus);
agerlach commented 8 years ago

Thanks for your help. This is exactly what I was looking for.

I had previously been using the python version of GPy, but I recently tried the Matlab version and I think there may be a few issues.

  1. internal.stats.parseArgs() in init.m is part of the statistic toolbox. You may want to mention that in the README or try to avoid needing it all together
  2. The Matlab version has some stability issues that don't show up in the Python version when running the sinc demos. For example, if I simply set n=2000 in demo_sinc.m I get the following error. Their is no issue in the Python version.
Error using svd
Input to SVD must not contain NaN or Inf.

Error in bayesianLinearRegression (line 10)
    [U,S,V] = svd(SIGMA);

Thanks again

csibrahim commented 8 years ago

Thanks again! I will look into this and let you know ASAP

csibrahim commented 8 years ago

Problem fixed!

I have not put this new feature in python yet, but I updated the matlab code to learn 'wL' in a more Bayesian way. I updated the bayesianLinearRegression.m file few days ago put I have not pushed it yet. I was not aware though that the old way is unstable

Thanks again for raising this issue