AbstractGP and Kriging perform badly due to lack of hyperparameter optimisation

st-- commented 2 years ago

If not improving it outright, it would be good to make this clear from the documentation, as the current state is quite confusing if you don't dive into the code and realize what's missing (e.g. see #251). This might partially be resolved by #224, but to be competitive with other packages such as mogp-emulator, a lot more work is needed, and this package doesn't work out of the box. (E.g. beyond hyperparameter optimisation, also careful initialisation of hyperparameters & priors on the parameters would be required.) Happy to add more detailed explanation if required.

vikram-s-narayan commented 2 years ago

Yes. I will add this info to the documentation. Thank you!

vikram-s-narayan commented 2 years ago

@st--

I'm planning on adding the following example to the documentation.

#this is a starter example for how to
#find optimal initial hyperparameters

using Surrogates
using AbstractGPs
using Hyperopt

sp(x) = sum(x.^2)
n_samples = 50
lower_bound = [-5.12, -5.12]
upper_bound = [5.12, 5.12]

xys = sample(n_samples, lower_bound, upper_bound, SobolSample())
zs = sp.(xys)
true_val = sp((0.0,0.0)) #only one validation point is taken in this example; more points can give better results

function surrogate_err_min(kernelType, Σcandidate)
    candidate_gp_surrogate = AbstractGPSurrogate(xys, zs, gp=kernelType, Σy=Σcandidate)
    return candidate_gp_surrogate((0.0,0.0)) - true_val
end

ho = @hyperopt for i=100,
    sampler = RandomSampler(),
    a = [GP(SqExponentialKernel()), GP(Matern32Kernel()), GP(Matern52Kernel())],  
    b = LinRange(0,1,100)
@show surrogate_err_min(a,b)
end

Hope this is in line with your suggestion?

st-- commented 2 years ago

Hi @vikram-s-narayan, just throwing Hyperopt.jl at it is definitely better than not optimising at all, but if I understand your example correctly, it makes a bunch of limiting assumptions:

it only optimises the noise variance, not the kernel hyperparameters (e.g. signal variance, lengthscale)
it treats the GP model the same you would e.g. a neural network where you have no guarantees on anything, simply minimising the error on some validation points _(NB: should your surrogate_errmin return MAE or RMSE (always >= 0) instead of the difference (which can be arbitrarily negative)?)

For GPs as a surrogate model, it'd be great to actually treat them properly, e.g. you can optimise all hyperparameters using the marginal likelihood as an objective (that doesn't require any validation points - just on the training points themselves!), see e.g. https://juliagaussianprocesses.github.io/AbstractGPs.jl/stable/examples/1-mauna-loa/#Hyperparameter-Optimization

(For more background reading, see these great tutorials: https://distill.pub/2019/visual-exploration-gaussian-processes/ and http://tinyurl.com/guide2gp)

st-- commented 2 years ago

It'd be good to make clear to readers/users what the limitations/assumptions of the examples are, so when they try it out they know that bad performance might be due to these limitations of your implementation, rather than due to any issues with the underlying method. (Then there's an incentive to improve the implementation, instead of just walking away from it thinking it's useless!)

SciML / Surrogates.jl

AbstractGP and Kriging perform badly due to lack of hyperparameter optimisation #328