Yelp / MOE

A global, black box optimization engine for real world metric optimization.
Other
1.3k stars 139 forks source link

Discontinuous (integer) search spaces in MOE #438

Open HGeerlings opened 9 years ago

HGeerlings commented 9 years ago

I had a question in regards to using MOE for making guesses over discontinuous spaces.

Say we had a space with vectors such as: [1, 5, 6], [9, 23, 11], etc. that are indeed continuous, but only in terms of integers (i.e. the domain of integers lies between 1 and 24 for above). Since MOE guesses float values such as 2.3948573 ... , I was wondering if there is any way you guys know of implementing integer guesses over the domain aside from simply rounding off the float values that MOE guesses; I feel this may hamper MOE's effectiveness, and also lead to repeated experiments. Any word on this would be very helpful.

Thanks!

dwinston commented 9 years ago

This was raised with #369 and closed, but never truly resolved unless your problem maps neatly to a multi-armed bandit solution. It seems your rounding/flooring tactic was the first answer given.

jialeiwang commented 8 years ago

Right now the optimization routines that maximize ExpectedImprovement are all designed for continuous space unfortunately. A simple fix would be to add optimization methods that work for discrete space, for example, genetic algorithm, to maximize ExpectedImprovement.

suntzu86 commented 8 years ago

Sorry I've been gone for so long! See: #448 for details.

@jialeiwang, I'm not sure adding a genetic algorithm would be so simple. How would the evolution process work? It'd need to take advantage of the structure of the covariance space to do better than just guessing integer values randomly, I would think.

@dwinston, #369 was closed because it was a question... the technical side of that didn't really happen and is quite hard in general. We were only trying to provide some color on what could be done currently with MOE.

@HGeerlings, I'll try to provide some guidance on using MOE for problems with integer components. For integers, what I have done in practice is the following

  1. Round MOE's output to the nearest integer.
  2. Restrict the hyperparameter domain. Since the values are ints, having a length scale of 0.001 is ridiculous. I've found a min in the range of 0.4 to 0.7 works well (case-dependent though). 0.5 has been reasonable. For the max, something like 1 to 10x the domain width is usually fine.

To see more of what I mean, fire up MOE's plotting interface. Add a point at 0 with fcn value 1. Then vary the length scale and see what the space in between 0 and 1 look like. You'll notice that at say length=0.1, MOE returns to the prior before you reach x=1. This is nonsense b/c if x is integral, we know there is no signal between 0 and 1. If you set the length=2.0, then at x=1.0, MOE still strongly feels the signal from x=0.0. This may destroy real signal in your problem.

It's a tradeoff. If you make the min hyperparam too large, you can mute out higher frequency signals. If you make it too small, MOE could always want to explore say 1.5 (if you've given it 1 and 2) b/c the uncertainty in the middle will be very high (i.e., almost back to the prior).

As for effectiveness, you are exactly right. It does hamper MOE's effectiveness to do this. But picking the right hyperparameters can mitigate the downsides. If there are a lot of integer dimensions (and few or no continuous ones), the negative performance impact tends to be exacerbated.

If you're finding too many repeated experiments, bump up the length scale a little so that MOE is less interested in the space in between integers. Also, you may get repeated experiments because MOE is done. As a dumb example, if I want to optimize an integer x on [0, 1], after sampling 0 and 1, there's nothing else to do. More generally, you may find that even after increasing the length scale substantially, MOE still wants to resample a known point... this would imply that the unknown points are all expected to be terrible and MOE believes it is done. In a non-integer problem, this would be true.

Not so in an integer problem though :( You could try...

Super long-term there are other ideas on how to make our GP model handle integers more intelligently. We'd probably want to adapt some kind of integer optimization scheme. Maybe an approximate one like a genetic algorithm. Maybe an exact one like branch & bound (where you kernalize the continuous search to bound the expected improvement in say 0<=x<5 vs 5<=x<=10 if x is an integer in [0,10]). These are probably pretty far off though...

RokoMijic commented 7 years ago

Super long-term there are other ideas on ...

In the short-term it would be nice if MOE had some way of handling integer dimensions.

For example, MOE could handle an integer dimension by running a GP with covariance 0.5, concentrating all probability mass onto integer values in that dimension and restricting the domain of the acquisition function to integers in that dimension. This would avoid the really hacky solutions you described above, e.g.:

restrict the domain bounds, ... add a fake point

It's probably not going to be as good as the best possible way of handling integers, but it's a strict improvement on adding fake points. Are you open to pull requests for something like this?