XieZikai commented 3 years ago

🐛 Bug

I'm trying to use GPytorch to do Bayesian optimization. I rewrote the bayes_opt structure to use gpytorch GPR in it with the same kernel and hyperparameters (Matern, nu=2.5, ucb as acquisition function, same hyperparameter for ucb). The result using gpytorch is wierd: I know the optima is on the boundary and BO over-explores boundaries[1] , but with gpytorch it never touch it Here is the result using bayes_opt (they are using sklearn GPR in it): standardbo It reaches the boundary just after several iterations.

[1]Siivola, Eero, et al. "Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations." 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2018.

wjmaddox commented 2 years ago

Can you potentially add a bit more details to the code?

This is likely an optimization issue of some sort, but it's hard to tell without more code.

Also, I'd encourage looking at botorch, a package that's built off of GPyTorch with the express purpose of using it for Bayes opt.

XieZikai commented 2 years ago

This is how I modify the bayes_opt BayesianOptimization class (bayes_opt: pip install bayesian-optimization)

from bayes_opt import bayesian_optimization as bo
from bayes_opt.event import Events
import gpytorch
import torch
from optimizers.GpytorchBO.gpytorch_models.linear_weight_GP import ExactGPModel
from optimizers.GpytorchBO.utils import gpytorch_acq_max, train_gp_model, GpytorchUtilityFunction

cuda_available = torch.cuda.is_available()

class GpytorchOptimization(bo.BayesianOptimization):

    def __init__(self, f, pbounds, random_state=None, verbose=2,
                 bounds_transformer=None, gp_regressor=None, GP=ExactGPModel, cuda=None):
        super(GpytorchOptimization, self).__init__(f, pbounds=pbounds, random_state=random_state, verbose=verbose,
                                                   bounds_transformer=bounds_transformer)
        if cuda is None and cuda_available:
            self.cuda = True
            print('cuda on')
        else:
            self.cuda = False
        if gp_regressor is None:
            self._GP = GP
            if self.cuda:
                self.likelihood = gpytorch.likelihoods.GaussianLikelihood().cuda()
            else:
                self.likelihood = gpytorch.likelihoods.GaussianLikelihood()
        self.gpytorch_model = None

    def suggest(self, utility_function):
        if len(self._space) == 0:
            return self._space.array_to_params(self._space.random_sample())
        gtorch_model = self._GP(torch.Tensor(self.space.params),
                                torch.Tensor(self.space.target),
                                self.likelihood)
        if self.cuda:
            gtorch_model = gtorch_model.cuda()
        train_gp_model(gtorch_model, epochs=50, verbose=0, cuda=self.cuda)
        self.gpytorch_model = gtorch_model
        suggestion = gpytorch_acq_max(
            ac=utility_function.utility,
            gp=gtorch_model,
            y_max=self._space.target.max(),
            bounds=self._space.bounds,
            random_state=self._random_state,
            cuda=self.cuda
        )
        return self._space.array_to_params(suggestion)

    def maximize(self,
                 init_points=5,
                 n_iter=25,
                 acq='ucb',
                 kappa=2.576,
                 kappa_decay=1,
                 kappa_decay_delay=0,
                 xi=0.0,
                 **gp_params):
        """Mazimize your function"""
        self._prime_subscriptions()
        self.dispatch(Events.OPTIMIZATION_START)
        self._prime_queue(init_points)
        self.set_gp_params(**gp_params)

        util = GpytorchUtilityFunction(
            kind=acq,
            kappa=kappa,
            xi=xi,
            kappa_decay=kappa_decay,
            kappa_decay_delay=kappa_decay_delay,
            cuda=self.cuda
        )
        iteration = 0
        while not self._queue.empty or iteration < n_iter:
            try:
                x_probe = next(self._queue)
            except StopIteration:
                util.update_params()
                x_probe = self.suggest(util)
                iteration += 1

            self.probe(x_probe, lazy=False)

            if self._bounds_transformer:
                self.set_bounds(
                    self._bounds_transformer.transform(self._space))

        self.dispatch(Events.OPTIMIZATION_END)

also, the acquisition function optimizer (follows bayes_opt structure as well):

def gpytorch_acq_max(ac, gp, y_max, bounds, random_state, n_warmup=10000, n_iter=10, cuda=False):
    # Warm up with random points
    x_tries = random_state.uniform(bounds[:, 0], bounds[:, 1],
                                   size=(n_warmup, bounds.shape[0]))

    ys = ac(x_tries, gp=gp, y_max=y_max)
    x_max = x_tries[ys.argmax()]
    max_acq = ys.max()

    # Explore the parameter space more throughly
    x_seeds = random_state.uniform(bounds[:, 0], bounds[:, 1],
                                   size=(n_iter, bounds.shape[0]))
    for x_try in x_seeds:
        # Find the minimum of minus the acquisition function
        res = minimize(lambda x: -ac(x.reshape(1, -1), gp=gp, y_max=y_max),
                       x_try.reshape(1, -1),
                       bounds=bounds,
                       method="L-BFGS-B")

        # See if success
        if not res.success:
            continue

        # Store it if better than previous minimum(maximum).
        if max_acq is None or -res.fun[0] >= max_acq:
            x_max = res.x
            max_acq = -res.fun[0]

    # Clip output to make sure it lies within the bounds. Due to floating
    # point technicalities this is not always the case.
    return np.clip(x_max, bounds[:, 0], bounds[:, 1])

Basically the only thing I modified is the GP regressor, changed it from a sklearn gpr into gpytorch gpr. I talked to a colleague and we think it may be caused by the learnable constant mean in gpytorch gpr because scikit learn uses 0 as the mean value, so that it may get a larger variance which causes a larger search space (touches the boundary).

wjmaddox commented 2 years ago

What's the function that you're trying to maximize/minimize here?

I talked to a colleague and we think it may be caused by the learnable constant mean in gpytorch gpr because scikit learn uses 0 as the mean value, so that it may get a larger variance which causes a larger search space (touches the boundary).

In this case, maybe you can try gpytorch.means.ZeroMean() and seeing if that changes your results.

XieZikai commented 2 years ago

We have a dataset of chemical experiments and use a ML model to simulate the result function. Ground truth is the optimal point on the boundary. I tried ZeroMean and it remains a similar result.

wjmaddox commented 2 years ago

How are you fitting the models in both gpytorch and sklearn? In general, optimization differences will also make an impact here.

for example, what are the values of the optimal hypers after training for both sets of models?

cornellius-gp / gpytorch

[Bug]No exploration on the boundary #1839

🐛 Bug