EmuKit / emukit

A Python-based toolbox of various methods in decision making, uncertainty quantification and statistical emulation: multi-fidelity, experimental design, Bayesian optimisation, Bayesian quadrature, etc.
https://emukit.github.io/emukit/
Apache License 2.0
605 stars 128 forks source link

Negative LCB Acquisition function in Batch Mode #330

Open chrisliu2007 opened 4 years ago

chrisliu2007 commented 4 years ago

Hi,

I found that when using Negative LCB in Batch mode, the new suggestions are not at the highest acquisition values. I tested with Branin function. I suspect it is because the Negative LCB is possible to produce negative values, while the local penalization function requires positive values from acquisition functions. (I'm surprised the code can still run though.)

I wonder if some implementation of "softplus" as implemented in GPyOpt can solve this problem. If not, could you please provide some suggestions? Thank you.

apaleyes commented 4 years ago

Can you share example code and some plots maybe, to illustrate the issue better? Thanks!

chrisliu2007 commented 3 years ago

Yes. Of course. Here are some partial code and the plots. I tested with Branin function

### Define the objective function
### In this case we use the Branin function available in Emukit.
f, _ = branin_function()

## GP regression
input_dim = len(X[0])
ker = GPy.kern.Matern52(input_dim = input_dim, ARD =False)#
ker.lengthscale.constrain_bounded(1e-2, 100.1)
ker.variance.constrain_bounded(1e-2, 1000.1)
#ker += GPy.kern.Bias(input_dim = input_dim)
model_gpy = GPRegression(X , Y, ker)
model_gpy.randomize()
model_gpy.optimize_restarts(num_restarts=20,verbose =False, messages=False)
objective_model = GPyModelWrapper(model_gpy)

## Upper Confidence Bound (UCB)
acquisition = NegativeLowerConfidenceBound(objective_model, beta = 1)
batch = 5
# Make loop and collect points
bayesopt_cons = BayesianOptimizationLoop(model=objective_model, 
                                         space=parameter_space, 
                                         acquisition=acquisition,
                                         batch_size = batch)

X_new = bayesopt_cons.candidate_point_calculator.compute_next_points(bayesopt_cons.loop_state)

The acquired next points (green circles) are not at the highest acquisition value. I suspect it is because the NLCB gives negative values to local penalization in batch mode. image

So, I purposely use Y-350 in the GP regression. Here is what I got.

image

apaleyes commented 3 years ago

I think you are correct, thanks for the update. The reason for this appears to be the fact that when using batch we are applying log transform to the acquisition. See this section for details: https://github.com/EmuKit/emukit/blob/master/emukit/bayesian_optimization/loops/bayesian_optimization_loop.py#L54

It is indeed surprising that is still runs. Can you check your console output? It must be full of numpy warnings of taking logs of negatives.

By the way of fixing it, we shall absolutely put validation inside the LogAcquisition implementation. Whether its a warning or a straight up exception I am not sure yet. Besides that, there is probably not much we can do really at this point. Essentially the way batch is implemented right now we cannot support it for Negative LCB.

chrisliu2007 commented 3 years ago

Thank you for the quick reply. I guess the current workaround is to offset the Y value so that it is always negative then, so the acquisition values of NLCB stays positive.

Do you think something similar to the "softplus" transformation in GPyOpt would probabbly work here? https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/acquisitions/LP.py

If so, I can try to implement that in the local_penalization function. Thanks!

apaleyes commented 3 years ago

I don't immediatelly see how something like that would help with negative values of the acquisition. But I might be wrong. Do you have any particular reason to think it would?

chrisliu2007 commented 3 years ago

I only went through the LP.py code in GPyOpt briefly. I could be wrong as well.

First, I want to make sure your comments are on the "softplus" transformation in LP.py.

If so, I thought the line 81: fval[fval_org<40.] = np.log(np.log1p(np.exp(fval_org[fval_org<40.]))) in LP.py try to convert the negative values to the positive np.log1p(np.exp(NLCB)).

image

apaleyes commented 3 years ago

Ah, that makes total sense now, thanks for clarifying. I wonder what is this magic 40 doing in there?

Is there any point in making this the default behavior of LP, instead of simple log transform we are doing now? The upside is that our batch mode becomes more general, you can use any acquisition, and not just positive valued functions. What are the downsides of it?

chrisliu2007 commented 3 years ago

ln(1+exp(x)) will almost be equal to x when x is large enough. I guess 40 is picked because it is large enough, and the computation does not need to go through the conversion of exp and log.

I don't see much of the downside here (at least, I have not realized any at the moment). The only trick here is to convert all the negative values to some value close to zero.

apaleyes commented 3 years ago

Summoning @mmahsereci to this thread. Maren, can you please share your opinion on this idea before we implement it?

mmahsereci commented 3 years ago

Hi, I checked the original paper of LP. I cite from the paragraph below Eq(4):

"g : IR → IR+ is a differentiable transformation of α(x) that keeps it strictly positive without changing the location of its extrema. We will use g(z) = z if α(x) is already positive and the soft-plus transformation g(z) = ln(1+e^z) elsewhere."

This suggestion from the paper is used in GPy I suppose (apart for the magic number 40). I believe the transformation would be applied to the acquisition function only and not to the the penalization term. It would take place probably above line 55 in the BO Loop here. This is before the log transform is applied to the product of acquisition and LP term. The log transform of the product happens in 2 places in our code since ln(αp) = ln(α) + ln(p). So we would have ln(softplus(α)) + ln(p) in the end which is I guess what the paper proposes.

It should be noted though (not sure if it is stated in the paper) that applying softplus moves the maximizer eventually. Hence probably the magic number 40 as @chrisliu2007 says. The log transform we perform does not move the maximizer as it is applied to the product of acquisition and LP term.

This is what I can say from first glance, it looks good to me to do this. We would require sth like SoftPlusAcquisition similar to LogAcquisition I suppose which is applied before the log transform in the Loop.