bayesian-optimization / BayesianOptimization

A Python implementation of global optimization with gaussian processes.
https://bayesian-optimization.github.io/BayesianOptimization/index.html
MIT License
7.81k stars 1.53k forks source link

Constrained optimization #412

Closed leandrobbraga closed 1 year ago

leandrobbraga commented 1 year ago

Hello,

I was looking into the constrained optimization code path and noticed a few things and I would like to ask some questions to see if my understanding is correct or not.

This is a snippet from the "acq_max" function:

def acq_max(ac, gp, y_max, bounds, random_state, constraint=None, n_warmup=10000, n_iter=10):
    x_tries = random_state.uniform(bounds[:, 0], bounds[:, 1],
                                   size=(n_warmup, bounds.shape[0]))
    ys = ac(x_tries, gp=gp, y_max=y_max)
    x_max = x_tries[ys.argmax()]
    max_acq = ys.max()

    # Explore the parameter space more throughly
    x_seeds = random_state.uniform(bounds[:, 0], bounds[:, 1],
                                   size=(n_iter, bounds.shape[0]))

    if constraint is not None:
        def to_minimize(x):
            target = -ac(x.reshape(1, -1), gp=gp, y_max=y_max)
            p_constraint = constraint.predict(x.reshape(1, -1))

            # TODO: This is not exactly how Gardner et al do it.
            # Their way would require the result of the acquisition function
            # to be strictly positive (or negative), which is not the case
            # here. For a negative target value, we use Gardner's version. If
            # the target is positive, we instead slightly rescale the target
            # depending on the probability estimate to fulfill the constraint.
            if target < 0:
                return target * p_constraint
            else:
                return target / (0.5 + p_constraint)
    else:
        to_minimize = lambda x: -ac(x.reshape(1, -1), gp=gp, y_max=y_max)

    for x_try in x_seeds:
        # Find the minimum of minus the acquisition function
        res = minimize(lambda x: to_minimize(x),
                       x_try,
                       bounds=bounds,
                       method="L-BFGS-B")

        # See if success
        if not res.success:
            continue

        # Store it if better than previous minimum(maximum).
        if max_acq is None or -np.squeeze(res.fun) >= max_acq:
            x_max = res.x
            max_acq = -np.squeeze(res.fun)

    # Clip output to make sure it lies within the bounds. Due to floating
    # point technicalities this is not always the case.
    return np.clip(x_max, bounds[:, 0], bounds[:, 1])

The first thing that I noticed was that the y_max that is passed here is the "unconstrained y_max" and not the "constrained y_max" and in the original paper I can cite this:

Thus the expected constrained improvement acquisition function EIC (xˆ) is precisely the standard expected improvement of xˆ over the best feasible point so far weighted by the probability that xˆ is feasible.

The second thing is that there is a "warm up" phase with cheap evaluations, this phase is not considering the p_constraint at all, which mean that they will get higher y_max values than normal (the same as p_constraint = 1).

    x_tries = random_state.uniform(bounds[:, 0], bounds[:, 1],
                                   size=(n_warmup, bounds.shape[0]))
    ys = ac(x_tries, gp=gp, y_max=y_max)
    x_max = x_tries[ys.argmax()]
    max_acq = ys.max()

Is this a bug? Does this make sense? I'm no expert in the area.

If it makes sense, I can help with a PR here.

Thank you!

till-m commented 1 year ago

Hi @leandrobbraga,

thanks for your interest in this package and scouring the code! For context, I authored the PR that added the constrained optimization.

1

The y_max that is being used here is the maximum value of the target function for parameters that fulfil the constraints. Unless your constraint function is non-deterministic (sidenote -- I cannot think of a case where this would be sensible. I don't think a constraint that is fulfilled randomly makes sense) and you have modelled this by using a GP with an additional WhiteKernel to account for noise, the result of constraint.predict for this point should always be p(y_max)=1, since the GP knows that the point fulfils the constraint of this point. In this sense, the constraint fulfilment probability is accounted for through y_max/p(y_max) = y_max.

2

I've thought about it, and I think are correct and this is not working as it should. By all means, feel free to make a PR and tag me, I will gladly review it.

leandrobbraga commented 1 year ago

Thank you for the time of replying, I'll prepare a PR soon.

About the first point,

1

The y_max that is being used here is the maximum value of the target function for parameters that fulfil the constraints. Unless your constraint function is non-deterministic (sidenote -- I cannot think of a case where this would be sensible. I don't think a constraint that is fulfilled randomly makes sense) and you have modelled this by using a GP with an additional WhiteKernel to account for noise, the result of constraint.predict for this point should always be p(y_max)=1, since the GP knows that the point fulfils the constraint of this point. In this sense, the constraint fulfilment probability is accounted for through y_max/p(y_max) = y_max.

When you look in this snippet

        # Finding argmax of the acquisition function.
        suggestion = acq_max(ac=utility_function.utility,
                             gp=self._gp,
                             constraint=self.constraint,
                             y_max=self._space.target.max(),
                             bounds=self._space.bounds,
                             random_state=self._random_state)

The self._space.target.max() is a simple ndarray containing all the y, feasible and unfeasible. My understanding is that the correct would be passing the max over only the feasible "ys", so the expected improvement can be calculated over the best feasible result so far. Otherwise we'll try to improve over "y" that are not feasible and possible much higher value than the feasible "ys", penalizing the region of feasible "ys". I noticed that this is the case after adding debug prints in this "y_max" inside the code and It was indeed an unfeasible "y".

In the other hand we have the self.space.max(). Which does respect the feasibility. There is a section of code specific for the constrained case, which go over the array of targets gathering only the ys that fulfill the constraint before picking the highest value.

    def max(self):
        """Get maximum target value found and corresponding parameters.

        If there is a constraint present, the maximum value that fulfills the
        constraint is returned."""
        if self._constraint is None:
            try:
                res = {
                    'target': self.target.max(),
                    'params': dict(
                        zip(self.keys, self.params[self.target.argmax()])
                    )
                }
            except ValueError:
                res = {}
            return res
        else:
            allowed = self._constraint.allowed(self._constraint_values)
            if allowed.any():
                # Getting of all points that fulfill the constraints, find the
                # one with the maximum value for the target function.
                sorted = np.argsort(self.target)
                idx = sorted[allowed[sorted]][-1]
                # there must be a better way to do this, right?
                res = {
                    'target': self.target[idx],
                    'params': dict(
                        zip(self.keys, self.params[idx])
                    ),
                    'constraint': self._constraint_values[idx]
                }
            else:
                res = {
                    'target': None,
                    'params': None,
                    'constraint': None
                }
            return res

Notice that in the unconstrained branch it uses the self.target.max() exactly as in the acq_max call.

Does it make sense? Or Am I misunderstand something?

till-m commented 1 year ago

Wow, good catch! I must've missed that. Sorry for the confusion and thanks for spotting this!

leandrobbraga commented 1 year ago

Ok! You can assign this issue to me! I'll author a MR for those issues!