SheffieldML / GPyOpt

Gaussian Process Optimization using GPy
BSD 3-Clause "New" or "Revised" License
927 stars 261 forks source link

Multiple evaluations of a target function with the same arguments #160

Open sharpsy opened 6 years ago

sharpsy commented 6 years ago

The following example demonstrates the issue:

import GPyOpt
from numpy.random import seed

seed(1234)

def myf(args):
    print(args)
    x, y = args[0]
    return (2*x)**2 + y

bounds = [{'name': 'var_1', 'type': 'discrete', 'domain': (-1, -0.5, 0.5, 1)},
          {'name': 'var_2', 'type': 'discrete', 'domain': (-1, -0.5, 0.5, 1)}]

myProblem = GPyOpt.methods.BayesianOptimization(myf, bounds)
myProblem.run_optimization(max_iter=10)

The output is:

[[ 1. -1.]] [[ 1. -1.]] [[ 0.5 -0.5]] [[-0.5 1. ]] [[-1. -0.5]] [[-1. -1.]] [[ 0.5 -0.5]] [[ 0.5 -0.5]]

We can see that the function is evaluated multiple times with the same arguments: [1, -1] two times and [0.5 -0.5] two times. This function can be arbitrarily expensive to calculate and evaluating it multiple times with the same arguments provides no benefits to the optimizer.

Using GPyOpt version 1.2.1 with GPy version 1.8.5 on Python 3.5.2

cookiees0 commented 6 years ago

Again, I'm not a dev. The first five points are chosen randomly (by default), so my guess is that your seed just happens to pick the same point [1, -1] twice. After that the optimizer terminates immediately when a point [0.5, -0.5] is sampled twice (probably for the reason that you wrote), instead of carrying out the 10 iterations. Setting de_duplication=True works for me to keep looking for new points instead of stopping once you hit a point which is evaluated twice.

(As a side note, I'm having an opposite issue with a very noisy function where I want to evaluate good points multiple times, and the optimizer either stops or gets stuck in that same point. Not sure if what I'm trying to do is a good idea, but re-sampling might not be completely pointless.)

sharpsy commented 6 years ago

Picking the same point during the initial random search and evaluating some point twice during the optimization look like two separate issues. I am searching for the optimal parameters of models where each evaluation takes an hour or more and using computing resources for so long to evaluate a result that is already known is not nice.

I did not know about the de_duplication argument, thanks on that. Will try it out.

ekalosak commented 4 years ago

The objective function is, by default, considered to be stochastic. Gaussian processes model distributions spread across a domain - the variances of these distributions is not a priori assumed to be 0, and this is a feature. However, de_duplication effectively clamps this variance and is the right solution! Consider closing this issue.