EmuKit / emukit

A Python-based toolbox of various methods in decision making, uncertainty quantification and statistical emulation: multi-fidelity, experimental design, Bayesian optimisation, Bayesian quadrature, etc.
https://emukit.github.io/emukit/
Apache License 2.0
603 stars 128 forks source link

The proper way of combining HMC with batch acquisition strategies for Bayesian Optimization #361

Open BrunoKM opened 3 years ago

BrunoKM commented 3 years ago

This issue is meant to ask two questions: (1) Does the combination of Local Penalization and IntegratedHyperparameterAcquisition behave in the expected way, and (1) What's the right way to approach Batch Bayesian Optimization using multiple Hamiltonian Monte-Carlo (HMC) samples for model hyperparameters in EmuKit?

(1) Combining Local Penalization and IntegratedHyperparameterAcquisition

Currently, when doing sequential Bayesian Optimization (BO) in EmuKit, one can wrap the acquisition function in a wrapper – IntegratedHyperParameterAcquisition – and that wrapper remains responsible for generating and marginalizing-out the HMC samples:

acquisition = IntegratedHyperparameterAcquisition(emukit_model, ExpectedImprovement, n_burnin=100, n_samples=20)

When using the IntegratedHyperparameterAcquisition abstraction, the samples are only being marginalised-out when the acquisition function is called:

# All the HMC samples are used to evaluate the one-point acquisition
acquisition.evaluate(x)

But not when any of the model's predict functions are called:

# Only 1 HMC sample is used (whichever one the model happened to be updated with last) 
emukit_model.predict(x)

This means that when an IntegratedHyperParameterAcquisition is passed into a batch point calculator — such as the LocalPenalizaitonPointCalculator — the HMC samples won't necessarily be used in all places during computation of the batch acquisition.


What happens when IntegratedHyperparameterAcquisitionFunction is combined with LocalPenlizationPointCalculator

In the specific case of the LocalPenlizationPointCalculator, this means that HMC samples will be used to evaluate the single-point acquisition function that's being penalized, but only a single arbitrary sample will be used to estimate the Lipschitz constant: https://github.com/EmuKit/emukit/blob/b7939897ebcff2281d0bb017186fd4ecf059df93/emukit/bayesian_optimization/local_penalization_calculator.py#L68

or to evaluate the penalizer: https://github.com/EmuKit/emukit/blob/b7939897ebcff2281d0bb017186fd4ecf059df93/emukit/bayesian_optimization/local_penalization_calculator.py#L54

This doesn't seem like the behaviour the user would intend in this case. Using an arbitrary HMC sample for estimation of the Lipschitz constant could lead to undesirable, arbitrary behaviour. Depending on which HMC sample happens to be last in the collection of generated samples, the effect of local penalization will be vastly different. This seems to have a significant effect on performance of Batch BO + HMC.

At the same time, there isn't an obvious way to properly consider all HMC samples in combination with the LocalPenlizationPointCalculator in Emukit, without rewriting that class yourself.


This brings me to another question: where in the hierarchy of abstractions in EmuKIt should generating samples for hyperparameters, and marginalising them out, fit in when doing Batch Bayesian Optimisation? This is an issue not just for the Local Penalization calculator, but also for hypothetical new batch point calculators the user might want to implement:

Combining IntegratedHyperparameterAcquisition with other batch acquisition methods

In another use-case, when someone might want to implement their own batch point calculator. As an example, they might want to use the MultipointExpectedImprovement, but select points for the batch by sequentially optimizing 1-point Expected Improvement (EI), then 2-point EI wrt. the second point, 3-point EI wrt. the third point, etc. (as described in Fast Computation of Expected Improvement).

In such cases, as well as in the case of a locally penalized acquisition function, the acquisition function changes at each iteration of the batch. One could wrap this new acquisition in an IntegratedHyperParameterAcquisition wrapper for every time, but this would require re-running HMC every time. One could also try to construct an acquisition_generator that returns the right acquisition function at each batch-step, but this seems feels like a complicated work-around. IntegratedHyperparameterAcquisition seems unsuitable for marginalising out hyperparameter samples for Batch Bayesian Optimisation, including for Local Penalization.

This to me hints at the possibility that storing the HMC samples in an abstraction related to the model, rather than the acquisition, might be a more generalisable approach? (I don't have a concrete suggestion for what that might look like at the moment) If there is a relatively simple way to overcome these difficulties with Emukit methods, I'd love to know.

BrunoKM commented 3 years ago

One small possible change would be to cache hyperparameter samples from model.generate_hyperparameter_samples(). This way each new instantiation of IngregratedHyperparameterAcquisition wouldn't needlessly regenerate the samples unless the model's been updated.

apaleyes commented 3 years ago

I can't claim understanding all details of what this issue is talking about. But as far as I can see, you indeed need a custom batch point calculator implementation. It can be based on existing LP implementation, or completely new. That's in line with the general idea of the library - the idea was never to support every possible use case out of the box, but to give people flexibility to twist and bend the library to adapt it to their needs.

BrunoKM commented 3 years ago

I guess the main point was that combining LocalPenlizationPointCalculator and IntegratedHyperParameterAcquisition can lead to unexpected (although logical) and undesirable behaviour, since LocalPenlizationPointCalculator won't marginalise out the samples when computing the Lipschitz constant or the penalizer.

I have failed to realise that myself when first combining the two. I think at the very least it would be great to have a note in the documentation for LocalPenlizationPointCalculator warning of this behaviour.

BrunoKM commented 3 years ago

If the idea is that it's on the user to construct a batch point calculator that marginalises out all the HMC samples when computing the penalized acquisition, I was wondering if there is a proper way of going about this without rewriting much of EmuKit functionality.

apaleyes commented 3 years ago

Well, it is and it isn't. Emukit is designed in a way that it shall be relatively easy for users to define their own behavior, if something isn't built in the package already.

Hw much of functionality exacly we are talking about? Again, at the moment it seems to be as though you need new acquisition implementation (that's essentially what point calculator is).