strange underestimations of noise around training points

Issue

Using Julia 1.7.2 and GaussianProcesses 0.12.4

I seem to be getting weird artifacts when looking at noisy observations. It appears like the training goes well, and the noise parameters are learnt correctly. But on closer observation, near training points, the noise drops by a few orders of magnitude.

Sorry for the horrible code below, i extracted this from another example, but it outputs ok:

Example


using Random
using Distributions
using Statistics
using LinearAlgebra
using GaussianProcesses

n = 200  # number of training points
p = 2   # input dim 
d = 1   # output dim

X = 2.0 * π * rand(p, n)
gx = sin.(X[1, :]) .+ cos.(X[2, :])

μ = 0.
Σ = 0.1 #sd
noise_samples = rand(Normal(μ, Σ), n)

Y = gx .+ noise_samples

# example kernel
rbf_len = log.(ones(size(X, 1)))
rbf_logstd = log(1.0)
rbf = SEArd(rbf_len, rbf_logstd)
white_logstd = log(0.1) #to make this easy
white = Noise(white_logstd)

kern = rbf + white

#regularization
logstd_regularization_noise = log(sqrt(1e-5))

# Instantiate GP model
m = GaussianProcesses.GPE(
    X, Y, MeanZero(), kern, logstd_regularization_noise)

# train    
optimize!(m, noise = false)
println("learnt kernel: ", m.kernel)

# what observational noise is learnt from samples:
learned_σ2 = m.kernel.kright.σ2
println("true noise var: ", Σ*Σ)
println("learnt noise var: ", learned_σ2)

#now predict at a test points that move slowly away from training points
exp_vals = [0,1e-9,1e-8,1e-7,1e-6,1e-5]
test_pts = X[:,1] .+ vcat([exp_vals exp_vals])'
μ_test = zeros(size(test_pts,2))
σ2_test = zeros(size(test_pts,2))
for i in 1:size(test_pts,2)
    pair = GaussianProcesses.predict_y(m,reshape(test_pts[:,i],2,1))
    μ_test[i] = pair[1][1]
    σ2_test[i] = pair[2][1]
end

println("predicted variances at Xtrain .+ dist:")
println("dist = ", exp_vals)
println(σ2_test)

Example Output

learnt kernel: Type: SumKernel{SEArd{Float64}, Noise{Float64}}
  Type: SEArd{Float64}, Params: [0.879242396397144, 0.8484489240981052, 0.4354946224590022]  Type: Noise{Float64}, Params: [-2.318057558697107]
true noise var: 0.010000000000000002
learnt noise var: 0.009695289622026527
predicted variances at Xtrain .+ dist:
dist = [0.0, 1.0e-9, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5]
[1.9990297251682988e-5, 1.9990297251682988e-5, 1.999029725079481e-5, 0.010271304568153217, 0.010271304508719426, 0.01027130391443214]

Notice how it learns a noise correctly ~ 0.01, but when it is near the training point it predicts 2e-5 before jumping to 1e-2 when further away.

What you're seeing makes sense to me. The Noise kernel is perhaps not what you expected, it is defined as k(x,x') = σ²δ(x-x'), where δ is the delta function (although this is making me realise the notation is not quite right). This means instead of the observation noise being independent between any two observations, it is independent between any two locations. So, this means that when you ask for a prediction at a location that has an observation in the training set, it has more information than when there isn't an observation, hence lower predicted variances. Looking at the implementation, there is some numerical tolerance built-in, as we use ≈ instead of =, see the julia documentation. That's why you're seeing a jump only when you move far enough away.

STOR-i / GaussianProcesses.jl