Open ekalosak opened 4 years ago
Don't know about the original thinking behind this line. But I traced the call stack back to Paramz package, and there it is only raised after a number of restarts (10 by default, and I don't believe this is changed anywhere). So it appears as though if you got this error GPyOpt level, a simple optimization restart won't help.
That's just a theory through. Do you have a code that runs into this exception? Would be nice to see if adding some simple retry logic would help.
Preliminary results confirm your hunch - I added a little retry loop
done=False
while not done:
try:
bo._update_model()
done=True
except LinAlgError as e:
nfail+=1
if nfail>5:
raise e
and it did nothing to stop the error. It was admittedly a fast hack - the real solution seems to require some digging. My new approach is to prune the X matrix upon getting an error - the traceback points to set_XY in the GPy.gp.GP object. This appears to be the trigger for the paramz
updatable
object's update_model
- such a long execution stack!
It's a sporadic error - bo.run_optimization()
is sufficient to generate it on a discrete/categorical response surface, but you have to get unlucky to reproduce.
To reproduce, in any discrete space, simply let the optimization run long enough and you'll end up with a bo.X
having sufficiently many duplicate rows (bo.X.shape[0]=110
and np.unique(bo.X,axis=0).shape[0]=30
) to cause even jittered Choelsky decompositions to fail in GPy (https://github.com/SheffieldML/GPy/blob/devel/GPy/util/linalg.py#L75)
NB: you need to make sure dedupe
is off.
I've round a work-around, but I'm not sure if it's general enough to be a useful PR:
while True: # outer optimization loop
try:
bo._update_model()
except LinAlgError as e:
bo.X, bo.Y = remove_dense_points_with_little_variation(bo.X, bo.Y)
bo.X = explore_more_points_far_from_already_evaluated(bo.X)
continue
This is particularly useful in spaces with no continuous dimensions because the next suggestion is snapped to the nearest lattice point, and depending on your acquisition function (EI, MPI are the worst offenders afaik), new suggestions are concentrated around 'very good' previous evaluations.
It might be better to leave it as it is, break
, and say, as it does, "optimization complete". What do you think @apaleyes about introducing an explore_more_when_stuck: bool
optional kwarg?
First of all, let me see if i understand this right. Are we essentially heading to "optimizaiton is stuck at a local extremum, let's jump elsewhere at random and start anew"? Or is it more subtle?
It is not more subtle, you hit the nail on the head. Admittedly, this is a blunt instrument. I'd love to move towards a more principled or correct solution, so any advice on that end would be greatly appreciated.
That's not necessarily a bad idea, at least as a first try. The API should probably say how many times we are allowed to jump. But it should be extensively tested, as I'd rather leave it as is then introduce some new parameters that just confuse the whole thing. I also not sure if the implementation is that simple, e.g. what should be done with acquisition function then?
@javiergonzalezh any thoughts?
Having implemented this solution privately and having used it in a few tens of thousands of optimization runs, I'm pretty happy with it - and the implementation is just that simple. Whether it's right is a totally separate question, of course.
Adding an API parameter with a default num_times_to_jump_when_stuck
value isn't a huge lift.
I'm not sure I understand what needs to be done with the acquisition function in this case - there's no surrogate model fitting nor acquisition sub-optimization required if explore_more_points_far_from_already_evaluated
is only contingent on the X
data and the design space. Maybe I'm missing a subtlety with the parameter update triggers or something like that?
On https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/core/bo.py#L138, the response to a LinAlgError in bo._update_model() is to break and end the optimization. This seems a little heavy-handed - exiting an optimization early for a hyperparameter optimization error doesn't sit right. What is the rationale behind this choice? Would retrying a couple times be a sensible alternative? Thanks for any/all input.