Open XieZikai opened 3 years ago
Can you potentially add a bit more details to the code?
This is likely an optimization issue of some sort, but it's hard to tell without more code.
Also, I'd encourage looking at botorch, a package that's built off of GPyTorch with the express purpose of using it for Bayes opt.
This is how I modify the bayes_opt BayesianOptimization class (bayes_opt: pip install bayesian-optimization)
from bayes_opt import bayesian_optimization as bo
from bayes_opt.event import Events
import gpytorch
import torch
from optimizers.GpytorchBO.gpytorch_models.linear_weight_GP import ExactGPModel
from optimizers.GpytorchBO.utils import gpytorch_acq_max, train_gp_model, GpytorchUtilityFunction
cuda_available = torch.cuda.is_available()
class GpytorchOptimization(bo.BayesianOptimization):
def __init__(self, f, pbounds, random_state=None, verbose=2,
bounds_transformer=None, gp_regressor=None, GP=ExactGPModel, cuda=None):
super(GpytorchOptimization, self).__init__(f, pbounds=pbounds, random_state=random_state, verbose=verbose,
bounds_transformer=bounds_transformer)
if cuda is None and cuda_available:
self.cuda = True
print('cuda on')
else:
self.cuda = False
if gp_regressor is None:
self._GP = GP
if self.cuda:
self.likelihood = gpytorch.likelihoods.GaussianLikelihood().cuda()
else:
self.likelihood = gpytorch.likelihoods.GaussianLikelihood()
self.gpytorch_model = None
def suggest(self, utility_function):
if len(self._space) == 0:
return self._space.array_to_params(self._space.random_sample())
gtorch_model = self._GP(torch.Tensor(self.space.params),
torch.Tensor(self.space.target),
self.likelihood)
if self.cuda:
gtorch_model = gtorch_model.cuda()
train_gp_model(gtorch_model, epochs=50, verbose=0, cuda=self.cuda)
self.gpytorch_model = gtorch_model
suggestion = gpytorch_acq_max(
ac=utility_function.utility,
gp=gtorch_model,
y_max=self._space.target.max(),
bounds=self._space.bounds,
random_state=self._random_state,
cuda=self.cuda
)
return self._space.array_to_params(suggestion)
def maximize(self,
init_points=5,
n_iter=25,
acq='ucb',
kappa=2.576,
kappa_decay=1,
kappa_decay_delay=0,
xi=0.0,
**gp_params):
"""Mazimize your function"""
self._prime_subscriptions()
self.dispatch(Events.OPTIMIZATION_START)
self._prime_queue(init_points)
self.set_gp_params(**gp_params)
util = GpytorchUtilityFunction(
kind=acq,
kappa=kappa,
xi=xi,
kappa_decay=kappa_decay,
kappa_decay_delay=kappa_decay_delay,
cuda=self.cuda
)
iteration = 0
while not self._queue.empty or iteration < n_iter:
try:
x_probe = next(self._queue)
except StopIteration:
util.update_params()
x_probe = self.suggest(util)
iteration += 1
self.probe(x_probe, lazy=False)
if self._bounds_transformer:
self.set_bounds(
self._bounds_transformer.transform(self._space))
self.dispatch(Events.OPTIMIZATION_END)
also, the acquisition function optimizer (follows bayes_opt structure as well):
def gpytorch_acq_max(ac, gp, y_max, bounds, random_state, n_warmup=10000, n_iter=10, cuda=False):
# Warm up with random points
x_tries = random_state.uniform(bounds[:, 0], bounds[:, 1],
size=(n_warmup, bounds.shape[0]))
ys = ac(x_tries, gp=gp, y_max=y_max)
x_max = x_tries[ys.argmax()]
max_acq = ys.max()
# Explore the parameter space more throughly
x_seeds = random_state.uniform(bounds[:, 0], bounds[:, 1],
size=(n_iter, bounds.shape[0]))
for x_try in x_seeds:
# Find the minimum of minus the acquisition function
res = minimize(lambda x: -ac(x.reshape(1, -1), gp=gp, y_max=y_max),
x_try.reshape(1, -1),
bounds=bounds,
method="L-BFGS-B")
# See if success
if not res.success:
continue
# Store it if better than previous minimum(maximum).
if max_acq is None or -res.fun[0] >= max_acq:
x_max = res.x
max_acq = -res.fun[0]
# Clip output to make sure it lies within the bounds. Due to floating
# point technicalities this is not always the case.
return np.clip(x_max, bounds[:, 0], bounds[:, 1])
Basically the only thing I modified is the GP regressor, changed it from a sklearn gpr into gpytorch gpr. I talked to a colleague and we think it may be caused by the learnable constant mean in gpytorch gpr because scikit learn uses 0 as the mean value, so that it may get a larger variance which causes a larger search space (touches the boundary).
What's the function that you're trying to maximize/minimize here?
I talked to a colleague and we think it may be caused by the learnable constant mean in gpytorch gpr because scikit learn uses 0 as the mean value, so that it may get a larger variance which causes a larger search space (touches the boundary).
In this case, maybe you can try gpytorch.means.ZeroMean()
and seeing if that changes your results.
We have a dataset of chemical experiments and use a ML model to simulate the result function. Ground truth is the optimal point on the boundary. I tried ZeroMean and it remains a similar result.
How are you fitting the models in both gpytorch and sklearn? In general, optimization differences will also make an impact here.
for example, what are the values of the optimal hypers after training for both sets of models?
🐛 Bug
I'm trying to use GPytorch to do Bayesian optimization. I rewrote the bayes_opt structure to use gpytorch GPR in it with the same kernel and hyperparameters (Matern, nu=2.5, ucb as acquisition function, same hyperparameter for ucb). The result using gpytorch is wierd: I know the optima is on the boundary and BO over-explores boundaries[1] , but with gpytorch it never touch it Here is the result using bayes_opt (they are using sklearn GPR in it): It reaches the boundary just after several iterations.
[1]Siivola, Eero, et al. "Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations." 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2018.