Open Zhylkaaa opened 2 years ago
add feedback loop from launcher to sweeper/experiment generator to allow for stream processing with ProcessPoolExecutors and similar
What sort of feedback loop did you have in mind?
I would like to propose creating experiment generator class
How this would fit together with the sweeper and launcher? What is the interface for communication between them?
it's the only way we can achieve high gpu utilization.
Do you have a working prototype?
Hi @Jasha10
We were thinking of an object that serves as a proxy between study and launcher.
Specifically we implemented this extending Generator and using generator.send()
. We expect the generator to return 3 value tuple with (index, trial, overrides)
and receive (via send
) 3 value tuple with (indexes, trials, results)
of experiments that have already finished.
Yes we have POC implemented with joblib and custom loky launcher and optuna sweeper, but if we come to an agreement we can adopt all sweepers and launchers to new api. I will open a PR and link it hear
Hi @Jasha10 to state that this isn't stale: we need to pass internal reviews and will try to send it ASAP when we have clearance.
🚀 Feature Request
Delegate experiments scheduling to launcher and add feedback loop from launcher to sweeper/experiment generator to allow for stream processing with
ProcessPoolExecutor
s and similar. Mainly to increase utilization of resources and avoid waiting for all batch processes to finish.Motivation
Is your feature request related to a problem? Please describe. My team uses optuna for hp sweeps of ML model with different training time, so if for example we use 8 gpu server and train 8 models in parallel with eg joblib launcher we can waste up to 1/4 walltime waiting for all batch jobs to finish.
Pitch
Describe the solution you'd like I would like to propose creating experiment generator class (so it's customizable). Sole porpoise of this class is returning new configuration and receiving experiment results to update study. I have my implementation based on generator's
send
method and it's working pretty good with my own loky launcher(akaconcurrent.features.ProcessPoolExecutor
) which I can also contribute.Describe alternatives you've considered
it's the only way we can achieve high gpu utilization. Alternatively we can adopt optuna plugin to accept futures and manage them, but I think job launching and result awaiting is a role of launcher Are you willing to open a pull request? (See CONTRIBUTING) Yes I can open pull request with my implementation if this feature is interesting
Additional context
Add any other context or screenshots about the feature request here.