Open stsievert opened 4 years ago
Sounds fun
On Sun, Jul 12, 2020 at 3:32 PM Scott Sievert notifications@github.com wrote:
The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:
- Has a global hyperparameter space for Bayesian sampling. This hyperparameter space will be refined over time according to the Bayesian sampling principle.
- Initializes models in a particular order:
- At first, initialize num_workers models. Train them as the most aggressive bracket of Hyperband specifies.
- When a model is stopped, initialize a new model with parameters sampled from the current hyperparameter space estimate. This model is from the most aggressive bracket if that bracket is not complete; otherwise it's from the next most aggressive bracket.
The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEXCMC3EY5KUWT34LLR3I2XNANCNFSM4OYAGQ3Q .
Any updates here ?
Probably not. All development happens on GitHub.
On Tue, Aug 25, 2020 at 3:04 PM Unex notifications@github.com wrote:
Any updates here ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680241692, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITLJAAWQMKZ7WVUKBLSCQKLZANCNFSM4OYAGQ3Q .
Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)
I don't believe anyone is working on it at the moment, though @stsievert might have a better idea.
On Tue, Aug 25, 2020 at 3:14 PM Unex notifications@github.com wrote:
Sorry, I mean, do you have the plan to implement the BOHB in the near future? I just hope I can use it when available:)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/697#issuecomment-680246244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIXAGDVPC756XRRKFYTSCQLSBANCNFSM4OYAGQ3Q .
do you have the plan to implement the BOHB in the near future?
I don't know of anyone that has a plan to implement BOHB. I have some ideas on how to implement it but that's about it.
edit 2021-10 this would require a lot of work around initializing new models. There needs to be interplay with the different successive halving brackets, which mean _fit needs significant reworking. I think this would bests be enabled by making _fit a class to separate the various components. Customization could be enabled by various callbacks. Here's a prototype:
The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale" includes an interesting parallelization technique for Bayesian sampling in a Hyperband implementation. In Section 4.2 the describe a scheme that does the following:
num_workers
models. Train them as the most aggressive bracket of Hyperband specifies.The number of workers will definitely influence performance: if there are infinite workers, the Bayesian sampling algorithm will not have time to run any inference on the best set of parameters. Likewise, if there's one worker Bayesian sampling can do as much inference as possible.
They show this performance:
Similar to Dask-ML's benchmark, they start saturating between 16 and 32 workers.