Open BrunoKM opened 3 years ago
One small possible change would be to cache hyperparameter samples from model.generate_hyperparameter_samples()
. This way each new instantiation of IngregratedHyperparameterAcquisition
wouldn't needlessly regenerate the samples unless the model's been updated.
I can't claim understanding all details of what this issue is talking about. But as far as I can see, you indeed need a custom batch point calculator implementation. It can be based on existing LP implementation, or completely new. That's in line with the general idea of the library - the idea was never to support every possible use case out of the box, but to give people flexibility to twist and bend the library to adapt it to their needs.
I guess the main point was that combining LocalPenlizationPointCalculator
and IntegratedHyperParameterAcquisition
can lead to unexpected (although logical) and undesirable behaviour, since LocalPenlizationPointCalculator
won't marginalise out the samples when computing the Lipschitz constant or the penalizer.
I have failed to realise that myself when first combining the two. I think at the very least it would be great to have a note in the documentation for LocalPenlizationPointCalculator
warning of this behaviour.
If the idea is that it's on the user to construct a batch point calculator that marginalises out all the HMC samples when computing the penalized acquisition, I was wondering if there is a proper way of going about this without rewriting much of EmuKit functionality.
Well, it is and it isn't. Emukit is designed in a way that it shall be relatively easy for users to define their own behavior, if something isn't built in the package already.
Hw much of functionality exacly we are talking about? Again, at the moment it seems to be as though you need new acquisition implementation (that's essentially what point calculator is).
This issue is meant to ask two questions: (1) Does the combination of Local Penalization and
IntegratedHyperparameterAcquisition
behave in the expected way, and (1) What's the right way to approach Batch Bayesian Optimization using multiple Hamiltonian Monte-Carlo (HMC) samples for model hyperparameters in EmuKit?(1) Combining Local Penalization and IntegratedHyperparameterAcquisition
Currently, when doing sequential Bayesian Optimization (BO) in EmuKit, one can wrap the acquisition function in a wrapper –
IntegratedHyperParameterAcquisition
– and that wrapper remains responsible for generating and marginalizing-out the HMC samples:When using the
IntegratedHyperparameterAcquisition
abstraction, the samples are only being marginalised-out when the acquisition function is called:But not when any of the model's predict functions are called:
This means that when an
IntegratedHyperParameterAcquisition
is passed into a batch point calculator — such as theLocalPenalizaitonPointCalculator
— the HMC samples won't necessarily be used in all places during computation of the batch acquisition.What happens when
IntegratedHyperparameterAcquisitionFunction
is combined withLocalPenlizationPointCalculator
In the specific case of the
LocalPenlizationPointCalculator
, this means that HMC samples will be used to evaluate the single-point acquisition function that's being penalized, but only a single arbitrary sample will be used to estimate the Lipschitz constant: https://github.com/EmuKit/emukit/blob/b7939897ebcff2281d0bb017186fd4ecf059df93/emukit/bayesian_optimization/local_penalization_calculator.py#L68or to evaluate the penalizer: https://github.com/EmuKit/emukit/blob/b7939897ebcff2281d0bb017186fd4ecf059df93/emukit/bayesian_optimization/local_penalization_calculator.py#L54
This doesn't seem like the behaviour the user would intend in this case. Using an arbitrary HMC sample for estimation of the Lipschitz constant could lead to undesirable, arbitrary behaviour. Depending on which HMC sample happens to be last in the collection of generated samples, the effect of local penalization will be vastly different. This seems to have a significant effect on performance of Batch BO + HMC.
At the same time, there isn't an obvious way to properly consider all HMC samples in combination with the
LocalPenlizationPointCalculator
in Emukit, without rewriting that class yourself.This brings me to another question: where in the hierarchy of abstractions in EmuKIt should generating samples for hyperparameters, and marginalising them out, fit in when doing Batch Bayesian Optimisation? This is an issue not just for the Local Penalization calculator, but also for hypothetical new batch point calculators the user might want to implement:
Combining
IntegratedHyperparameterAcquisition
with other batch acquisition methodsIn another use-case, when someone might want to implement their own batch point calculator. As an example, they might want to use the
MultipointExpectedImprovement
, but select points for the batch by sequentially optimizing 1-point Expected Improvement (EI), then 2-point EI wrt. the second point, 3-point EI wrt. the third point, etc. (as described in Fast Computation of Expected Improvement).In such cases, as well as in the case of a locally penalized acquisition function, the acquisition function changes at each iteration of the batch. One could wrap this new acquisition in an
IntegratedHyperParameterAcquisition
wrapper for every time, but this would require re-running HMC every time. One could also try to construct anacquisition_generator
that returns the right acquisition function at each batch-step, but this seems feels like a complicated work-around.IntegratedHyperparameterAcquisition
seems unsuitable for marginalising out hyperparameter samples for Batch Bayesian Optimisation, including for Local Penalization.This to me hints at the possibility that storing the HMC samples in an abstraction related to the model, rather than the acquisition, might be a more generalisable approach? (I don't have a concrete suggestion for what that might look like at the moment) If there is a relatively simple way to overcome these difficulties with Emukit methods, I'd love to know.