PKU-DAIR / open-box

Generalized and Efficient Blackbox Optimization System
https://open-box.readthedocs.io
Other
380 stars 53 forks source link

how to control optimization process #84

Open rmrmg opened 9 months ago

rmrmg commented 9 months ago

There is very little information in documentation so based on source code it seems to me there are such option to control optimization process (all from https://github.com/PKU-DAIR/open-box/blob/96841e65a7e9900e5549d5f68411db35500f2e92/openbox/optimizer/generic_smbo.py):

  1. advisor_type - basically define how now points for testing are proposed. There are such option:

  2. for BO you need to define surrogate model

    • 'gp' (gaussian-process), become slow after few hundreds points, according to doc should works best for mathematical problem (whatever it means)
    • 'prf' (probabilistc random forest) - documentation says it works best for practical problem (whatever it is), I guess it should work faster for long optimization (several hundreds and more points)
    • 'lightgbm'- this is gradient boosting which should work well, but in doc I could find any works any this. Is it worst that prf?
  3. for BO you need to defined acq_type, it is algorithm for new point proposal (which is tradeoff between exploration and exploitation). Among available option 'ei' (default) and 'pi' are well know and very popular with try to balance between exploration and exploitation, 'lbc' will promotes exploration, id est you will find shape of your space but not necessary best solution in the space.

  4. acq_optimizer_type I dont know what it is and when it is applicable

  5. init_strategy; is the strategy apply only for initial_runs (default=3) number of steps? Is this affect long (300+ steps) optimization process? @open-box-team could you please coment and answer to questions?

jhj0411jhj commented 9 months ago

Hi @rmrmg, for your questions:

  1. advisor_type:

    • 'tpe': Tree-structured Parzen Estimator (paper link) uses kernel density estimator and computes EI skillfully, so surrogate_type and acq_type are not used.
    • 'ea': Evolutionary Algorithms - EA hardly takes any time to compute, while BO needs to fit a model in each iteration. If you want to run the optimization in more rounds (1000-10000), you may use EA. And EA is a competitive method if the config space is full of categorical parameters (e.g. in Neural Architecture Search).
    • 'random': Random Search - this is probably totally useless for real-life application (If the input dim is very high, many methods may not perform better than RS)
    • 'mcadvisor': Bayesian Optimization with Monte Carlo Sampling - MC algorithms compute acquisition values via MC sampling instead of using the posterior mean and variance of GP. In some cases, it will accelerate the calculation, but currently we suggest using the common Advisor and acquisitions like EI.
  2. for BO you need to define surrogate model

    • 'gp' (gaussian-process), become slow after few hundreds points, according to doc should works best for mathematical problem (that is correct). Not feasible for large input spaces.
    • 'prf' (probabilistc random forest) - work faster for long optimization (several hundreds and more points). And perform better than GP if there are many categorical parameters in the config space or the number of parameter is large.
    • 'lightgbm'- the performance is not tested. We suggest using prf instead.
    • You may set surrogate_type='auto' to automatically choose a proper model for the optimization.
  3. for BO you need to defined acq_type, EI is the most common choice for single objective optimization.

  4. acq_optimizer_type. Set to 'random_scipy' if the space are full of continuous parameters (int and float). It cost more time but performs better. Otherwise set to 'local_random'. You can also set 'auto'.

  5. init_strategy; the strategy applies only for initial_runs (default=3) number of steps. It controls the generation of initial configs. After the initial steps, the optimization uses BO to generate new config. It does not control the long (300+ steps) optimization process. However, the overall optimization result is to some extent based on the selection of initial points.