SciML / Surrogates.jl

Surrogate modeling and optimization for scientific machine learning (SciML)
https://docs.sciml.ai/Surrogates/stable/
Other
335 stars 69 forks source link

AbstractGP and Kriging perform badly due to lack of hyperparameter optimisation #328

Open st-- opened 2 years ago

st-- commented 2 years ago

If not improving it outright, it would be good to make this clear from the documentation, as the current state is quite confusing if you don't dive into the code and realize what's missing (e.g. see #251). This might partially be resolved by #224, but to be competitive with other packages such as mogp-emulator, a lot more work is needed, and this package doesn't work out of the box. (E.g. beyond hyperparameter optimisation, also careful initialisation of hyperparameters & priors on the parameters would be required.) Happy to add more detailed explanation if required.

vikram-s-narayan commented 2 years ago

Yes. I will add this info to the documentation. Thank you!

vikram-s-narayan commented 2 years ago

@st--

I'm planning on adding the following example to the documentation.

#this is a starter example for how to
#find optimal initial hyperparameters

using Surrogates
using AbstractGPs
using Hyperopt

sp(x) = sum(x.^2)
n_samples = 50
lower_bound = [-5.12, -5.12]
upper_bound = [5.12, 5.12]

xys = sample(n_samples, lower_bound, upper_bound, SobolSample())
zs = sp.(xys)
true_val = sp((0.0,0.0)) #only one validation point is taken in this example; more points can give better results

function surrogate_err_min(kernelType, Σcandidate)
    candidate_gp_surrogate = AbstractGPSurrogate(xys, zs, gp=kernelType, Σy=Σcandidate)
    return candidate_gp_surrogate((0.0,0.0)) - true_val
end

ho = @hyperopt for i=100,
    sampler = RandomSampler(),
    a = [GP(SqExponentialKernel()), GP(Matern32Kernel()), GP(Matern52Kernel())],  
    b = LinRange(0,1,100)
@show surrogate_err_min(a,b)
end

Hope this is in line with your suggestion?

st-- commented 2 years ago

Hi @vikram-s-narayan, just throwing Hyperopt.jl at it is definitely better than not optimising at all, but if I understand your example correctly, it makes a bunch of limiting assumptions:

For GPs as a surrogate model, it'd be great to actually treat them properly, e.g. you can optimise all hyperparameters using the marginal likelihood as an objective (that doesn't require any validation points - just on the training points themselves!), see e.g. https://juliagaussianprocesses.github.io/AbstractGPs.jl/stable/examples/1-mauna-loa/#Hyperparameter-Optimization

(For more background reading, see these great tutorials: https://distill.pub/2019/visual-exploration-gaussian-processes/ and http://tinyurl.com/guide2gp)

st-- commented 2 years ago

It'd be good to make clear to readers/users what the limitations/assumptions of the examples are, so when they try it out they know that bad performance might be due to these limitations of your implementation, rather than due to any issues with the underlying method. (Then there's an incentive to improve the implementation, instead of just walking away from it thinking it's useless!)