dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
897 stars 256 forks source link

Include Bayesian sampling in Hyperband implementation #697

Open stsievert opened 4 years ago

stsievert commented 4 years ago

The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:

  1. Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.
  2. Initializes models in a particular order:
    • At first, initialize num_workers models. Train them as the most aggressive bracket of Hyperband specifies.
    • When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.

The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.

They show this performance:

Screen Shot 2020-07-12 at 5 34 00 PM

Similar to Dask-ML's benchmark, they start saturating between 16 and 32 workers.

mrocklin commented 4 years ago

Sounds fun

On Sun, Jul 12, 2020 at 3:32 PM Scott Sievert notifications@github.com wrote:

The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:

  1. Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.
  2. Initializes models in a particular order:
    • At first, initialize num_workers models. Train them as the most aggressive bracket of Hyperband specifies.
    • When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.

The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEXCMC3EY5KUWT34LLR3I2XNANCNFSM4OYAGQ3Q .

UTUnex commented 4 years ago

Any updates here ?

TomAugspurger commented 4 years ago

Probably not. All development happens on GitHub.

On Tue, Aug 25, 2020 at 3:04 PM Unex notifications@github.com wrote:

Any updates here ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680241692, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITLJAAWQMKZ7WVUKBLSCQKLZANCNFSM4OYAGQ3Q .

UTUnex commented 4 years ago

Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)

TomAugspurger commented 4 years ago

I don't believe anyone is working on it at the moment, though @stsievert might have a better idea.

On Tue, Aug 25, 2020 at 3:14 PM Unex notifications@github.com wrote:

Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680246244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIXAGDVPC756XRRKFYTSCQLSBANCNFSM4OYAGQ3Q .

stsievert commented 4 years ago

do you have the plan to implement the BOHB in the near future?

I don't know of anyone that has a plan to implement BOHB. I have some ideas on how to implement it but that's about it.

edit 2021-10 this would require a lot of work around initializing new models. There needs to be interplay with the different successive halving brackets, which mean _fit needs significant reworking. I think this would bests be enabled by making _fit a class to separate the various components. Customization could be enabled by various callbacks. Here's a prototype:

``` python class _HyperOpt: def __init__(self, initial_params, model_fn): self.initial_params = initial_params self.model_fn = model_fn def start_fit(self): self.launched_models = self.n_models for _ in range(self.n_models): self.launch(self.model_fn(**random_params)) def decision_made(self, ident: str, model: BaseEstimator, keep_training: bool): pass def _fit(self): # reworked version of dask_ml.model_selection._incremental._fit futures = self.start_fit() for f in as_completed(futures): ... promoted, fired = hyperband_alg() for m in promoted: self.decision_made(m, True) for m in fired: self.decision_made(m, False) class _BayesianOnHyperBand(_HyperOpt): def decision_made(self, ident, model, keep_training): self.params_ = bayesian_update(model, self.params_) new_model = self.model_fn(**self.initial_params) if self.launched_models < self.n_models and not keep_training: self.launch(new_model) def start_fit(self): self.params_ = deepcopy(self.initial_params) self.launched_models = n_workers for _ in range(n_workers): m = self.model_fn(**self.initial_params) self.launch(m) ```