SheffieldML / GPyOpt

Gaussian Process Optimization using GPy
BSD 3-Clause "New" or "Revised" License
928 stars 261 forks source link

Optimization chooses and gets stuck at an infeasible point #93

Closed jkaardal closed 6 years ago

jkaardal commented 7 years ago

Hi GPyOpt developers. I have an issue where GPyOpt chooses an infeasible next point when the number of variables in my problem exceeds 8 and then immediately converges to a suboptimal infeasible point (with respect to the inequality constraints, the chosen point is still within the domain). What I am doing is optimizing multiple l2-norm regularization parameters, an n dimensional vector denoted x, where the unknown function f(x) is the solution to a regularized logistic regression with respect to some weights, w, under the inequality constraints x_1 <= x_2 <= x_3 <= ... <= x_n. The purpose of these constraints is to reduce the computation time since any permutation of the elements of x will result in the same value of f(x). Here is an example of how I have the bounds and constraints set up for an n = 5 problem before being passed into BayesianOptimization():

domain = [{'domain': (0.0, 1.0), 'type': 'continuous', 'name': x_1_b'},
                  {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_2_b'},
                  {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_3_b'},
                  {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_4_b'},
                  {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_5_b'}]

constraints = [{'constrain': 'x[:,0] - x[:,1]', 'name': 'x_1_2_c'},
                       {'constrain': 'x[:,1] - x[:,2]', 'name': 'x_2_3_c'},
                       {'constrain': 'x[:,2] - x[:,3]', 'name': 'x_3_4_c'},
                       {'constrain': 'x[:,3] - x[:,4]', 'name': 'x_4_5_c'}]

Other information: I am using the matern32 kernel (this phenomenon occurs with matern52 as well) with the maximum likelihood update to the Gaussian process hyperparameters and the lower confidence bound acquisition function (default values except jitter is 1E-4 to protect against singularity of the kernel). The optimization has worked fine with these constraints when n < 9 and also worked for n > 8 when the constraints were removed.

javiergonzalezh commented 7 years ago

Hi Joel,

you only have the problem when you have more than 8 variables? I can't see any evident reason why that may be happening although you seem to be reaching the limit in which the optimization (with an stationary kernel) struggles due to the dimensionality. A simple test you can try is to add more jitter in the acquisition to see if that avoids the local minimum.

Just let us know.

On 22 May 2017 at 01:19, Joel Kaardal notifications@github.com wrote:

Hi GPyOpt developers. I have an issue where GPyOpt chooses an infeasible next point when the number of variables in my problem exceeds 8 and then immediately converges to a suboptimal infeasible point. What I am doing is optimizing multiple l2-norm regularization parameters, an n dimensional vector denoted x, where the unknown function f(x) is the solution to a regularized logistic regression with respect to some weights, w, under the inequality constraints x_1 <= x_2 <= x_3 <= ... <= x_n. These purpose of these constraints is to reduce the computation time since any permutation of the elements of x will result in the same value of f(x). Here is an example of how I have the bounds and constraints set up before being passed into BayesianOptimization(): ` domain = [{'domain': (0.0, 1.0), 'type': 'continuous', 'name': x_1_b'}, {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_2_b'}, {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_3_b'}, {'domain': (0.0, 1.0), 'type': 'continuous', 'name': 'x_4_b'}]

constraints = [{'constrain': 'x[:,0] - x[:,1]', 'name': 'x_1_c'}, {'constrain': 'x[:,1] - x[:,2]', 'name': 'x_2_c'}, {'constrain': 'x[:,2] - x[:,3]', 'name': 'x_3_c'}, {'constrain': 'x[:,3] - x[:,4]', 'name': 'x_4_c'}] ` Other information: I am using the matern32 kernel (this phenomenon occurs with matern52 as well) with the maximum likelihood update to the Gaussian process hyperparameters and the lower confidence bound acquisition function (default values except jitter is 1E-4 to protect against singularity of the kernel). The optimization has worked fine with these constraints when n < 9 and also worked for n > 8 when the constraints were removed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SheffieldML/GPyOpt/issues/93, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiS8zIyWnIhfoCR52ZMQLL4_BGEItNaks5r8NSEgaJpZM4NhxX0 .

jkaardal commented 7 years ago

Hi Javier, thank you for the suggestion.

I previously tried a few runs with a jitter of 0.01 and the algorithm still chose an infeasible point after the initialization trials. I just tried jitters of 1.0, 10.0, and 100.0 and ran each a number of times. With these jitters it will only sometimes acquire a feasible initial point; probably a bit more often than randomly but less often than it chooses an infeasible initial point. There may be a slight improvement in this probability but it is hard to say for certain since the logistic regression itself takes a while to run and I worry that the jitter is becoming impractically large while examples of infeasible initial acquisitions remain.

Out of curiosity, what optimization algorithm is being used to maximize the acquisition function? From what I can surmise, it appears that the default is fmin_l_bfgs_b from scipy, right? What strategy is used to invoke the inequality constraints? I have an interior-point algorithm I can try if there is an easy way to tell GPyOpt to use an external optimizer.

jkaardal commented 7 years ago

I should add that when I let GPyOpt choose random initialization trials, these random points are in the correct feasible space but the initial acquisition point following the initialization phase is still usually infeasible.

Edit: Hm, actually it looks like when n = 10 the initialization phase has trouble finding feasible points to try and the program hangs. I guess it isn't surprising since those constraints become increasingly difficult to satisfy as n increases when drawing random points from the domain.

jkaardal commented 7 years ago

Here is a minimal working example that reproduces the problem (SMF_example.py). Note that the algorithm nearly always completes immediately at an infeasible point when n = 8 but at n = 10 the program hangs during the initialization phase. If n is decreased below 8 the incidence of infeasibility decreases substantially becoming virtually non-existant at n = 4. When an infeasible acquisition occurs, the example code prints an alert to screen.

jkaardal commented 7 years ago

I ended up locating the problem. The issue is that samples_multidimensional_uniform() from submodule GPyOpt.util.general draws a finite number of random samples from within the bounds but neglects the inequality constraints. This was causing problems later on during the negative acquisition function minimization because the L-BFGS line search cannot find an _facqu < 0. Similarly, this explanation is consistent with the difficulty observed in drawing a feasible initial x from a uniform distribution at n = 10.

I would imagine the only quick and practical solution for this would be to allow the user to customize or replace samples_multidimensional_uniform() to meet problem-specific constraints. I attempted to do this by replacing the submodule function (e.g. GPyOpt.util.general.samples_multidimensional_uniform = mySampling) which worked for the initialization phase but reverted back to the default when I ran bayes_opt.run_optimization(max_iter=max_iter, eps=eps), probably due to a re-import of GPyOpt.util.general somewhere. Instead, modifying samples_multidimensional_uniform() in the GPyOpt source code fixed the problem (but, of course, this is not a general solution).

A less practical but more general/automatic solution might be to provide an option to use _fmincobyla or _fminslsqp from scipy for the acquisition function maximization which allow for infeasible starting points (at least _fmincobyla is supposed to), but these may require a substantial modification of the code.

javiergonzalezh commented 7 years ago

Hi Joel,

Thanks so much for having a look to this. Would you be able to make a pull request with your changes? As you mention the replace the optimizer may require many changes in the code. An intermediate solution could be to implement the logic to check whether an initial pint is in the feasible region. It it is not, one could simply jump the to the closest one in the active domain.

On 25 May 2017 at 01:01, Joel Kaardal notifications@github.com wrote:

I ended up locating the problem. The issue is that samples_multidimensional_uniform() from submodule GPyOpt.util.general draws a finite number of random samples from within the bounds but neglects the inequality constraints. This was causing problems later on during the negative acquisition function minimization because the L-BFGS line search cannot find an f_acqu < 0. Similarly, this explanation is consistent with the difficulty observed in drawing a feasible initial x from a uniform distribution at n = 10.

I would imagine the only practical solution for this would be to allow the user to customize or replace samples_multidimensional_uniform() to meet his/her problem-specific constraints. I attempted to do this by replacing the submodule function (e.g. GPyOpt.util.general.samples_multidimensional_uniform = mySampling) which worked for the initialization phase but reverted back to the default when I ran bayes_opt.run_optimization(max_iter=max_iter, eps=eps), probably due to a re-import of GPyOpt.util.general somewhere. Instead, modifying samples_multidimensional_uniform() in the GPyOpt source code fixed the problem (but, of course, this is not a general solution).

A less practical but more general/automatic solution might be to provide an option to use fmin_cobyla or fmin_slsqp from scipy for the acquisition function maximization which allow for infeasible starting points (at least fmin_cobyla is supposed to), but these may require a substantial modification of the code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SheffieldML/GPyOpt/issues/93#issuecomment-303885827, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiS86ekEVfYrEpCoBGCf-IxOycaHyXsks5r9MTwgaJpZM4NhxX0 .

jkaardal commented 7 years ago

Certainly, I can do that. I should be able to send a pull request sometime next week after I do some more testing. I will probably try to implement the intermediate solution you suggested as well if I have time.

nakpinar commented 7 years ago

Hi Joel, do you by any chance already have a pull request ready? I stumbled upon the same problem... Thank you!

jkaardal commented 7 years ago

Hi nakpinar,

I ended up getting distracted by other obligations and have not yet had the chance to send the pull request. I was also holding off for a bit because I was a bit dissatisfied with my solution because it just involves the user passing a new keyword argument that replaces GPyOpt.util.general.samples_multidimensional_uniform with a custom sampling function rather than a general and automatic solution like was discussed on the May 24 post.

Having thought about it more, however, I am not sure an automatic solution will be possible for all but the most simple constraints (e.g. linear constraints). I have tested both my own interior-point algorithm and scipy's fmin_slsqp on nonlinear constraint satisfaction problems and have found that there remains a significant danger of "converging" to infeasible points. I would imagine fmin_cobyla would not be any better since it is an even more primitive approach to constrained optimization (but I have not tested this). (As an aside, there is an intuitive explanation for why infeasible "convergence" might occur for nonlinear constraints. If one were to attempt to find a feasible point by taking the squared-sum of the active constraints and find a least squares solution, the value of the objective function would be zero at a feasible point. But for a nonlinear problem, this objective function could be non-convex and have local minima with non-zero objective value.)

I will send a pull request in the next couple days with the simple keyword argument solution after I run the tests.

Question for the developers: Why is the value for the key 'constrain' a string (that is later evaluated) rather than a function? Is this meant to give the user access to local and class variables?

javiergonzalezh commented 6 years ago

Thanks for the PR! Closing the issue now.