STOR-i / GaussianProcesses.jl

A Julia package for Gaussian Processes
https://stor-i.github.io/GaussianProcesses.jl/latest/
Other
308 stars 53 forks source link

Posterior does not go through zero noise points #168

Closed pmcvay closed 3 years ago

pmcvay commented 3 years ago

After creating a very simple gaussian process and optimizing the hyperparameters, the plot for the gaussian process does not go through the points even though the noise parameter in the model is kept at zero. This can't be right can it? A MWE is below

x = [-4.602788110200726,-4.139181620169343,-2.6301039432757323,-3.043617635669078,-1.8155857328974712,1.4540582528517554,2.2586733784496733,1.7758404039023823]
y = [0.11104670935329886,1.3801211456155509,0.22041330689399533,0.30802797871313936,-1.0854104743648805,-0.005292301147377261,0.1119234155593371,-0.6394042063958967]

gp = GP(x,y,MeanZero(),SE(0.0, 0.0), -Inf)
plot(gp) # correctly goes through each point
r2 = optimize!(gp, noise = false)
plot(gp) # does not go through each point
maximerischard commented 3 years ago

Thank you for the nicely reproducible MWE, that made this a lot easier. It turns out the second plot is lying to you, and actually the GP does go through the points. What's happening is that the optimization took the lengthscale parameter to a tiny number:

sqrt(gp.kernel.ℓ2) # 1.7e-6

so the predictive mean is a flat line at zero that occasionally spikes to meet one of the data points and then goes back to zero. Your fitted GP therefore pretty much boils down to iid normal data with mean zero and standard deviation sqrt(gp.kernel.σ2)=0.57. If you want to see it for yourself, here's a bit of plotting code (using PyPlot rather than the Plots library because I'm more familiar with it, sorry).

import PyPlot; plt=PyPlot
xx = sort([range(-5 ,3; step=0.01); x])
μ, σ = GaussianProcesses.predict_f(gp, xx)
plt.plot(xx, μ)
plt.fill_between(xx, μ.-σ, μ.+σ, alpha=0.1)

I think what you're bumping into is the inflexibility of the squared exponential kernel. It's perhaps not a very well known thing in machine learning but is well known in spatial statistics (where Gaussian processes are known as kriging). The solution is to use a kernel that is not infinitely differentiable, like Mat52Iso, and you will get behaviour that is probably closer to what you were hoping to see.