Zhylkaaa commented 2 years ago

🚀 Feature Request

Delegate experiments scheduling to launcher and add feedback loop from launcher to sweeper/experiment generator to allow for stream processing with ProcessPoolExecutors and similar. Mainly to increase utilization of resources and avoid waiting for all batch processes to finish.

Motivation

Is your feature request related to a problem? Please describe. My team uses optuna for hp sweeps of ML model with different training time, so if for example we use 8 gpu server and train 8 models in parallel with eg joblib launcher we can waste up to 1/4 walltime waiting for all batch jobs to finish.

Pitch

Describe the solution you'd like I would like to propose creating experiment generator class (so it's customizable). Sole porpoise of this class is returning new configuration and receiving experiment results to update study. I have my implementation based on generator's send method and it's working pretty good with my own loky launcher(aka concurrent.features.ProcessPoolExecutor) which I can also contribute.

Describe alternatives you've considered

it's the only way we can achieve high gpu utilization. Alternatively we can adopt optuna plugin to accept futures and manage them, but I think job launching and result awaiting is a role of launcher Are you willing to open a pull request? (See CONTRIBUTING) Yes I can open pull request with my implementation if this feature is interesting

Additional context

Add any other context or screenshots about the feature request here.

Jasha10 commented 2 years ago

add feedback loop from launcher to sweeper/experiment generator to allow for stream processing with ProcessPoolExecutors and similar

What sort of feedback loop did you have in mind?

I would like to propose creating experiment generator class

How this would fit together with the sweeper and launcher? What is the interface for communication between them?

it's the only way we can achieve high gpu utilization.

Do you have a working prototype?

Zhylkaaa commented 2 years ago

Hi @Jasha10 We were thinking of an object that serves as a proxy between study and launcher. Specifically we implemented this extending Generator and using generator.send(). We expect the generator to return 3 value tuple with (index, trial, overrides) and receive (via send) 3 value tuple with (indexes, trials, results) of experiments that have already finished.

Yes we have POC implemented with joblib and custom loky launcher and optuna sweeper, but if we come to an agreement we can adopt all sweepers and launchers to new api. I will open a PR and link it hear

Zhylkaaa commented 2 years ago

Hi @Jasha10 to state that this isn't stale: we need to pass internal reviews and will try to send it ASAP when we have clearance.

facebookresearch / hydra

[Feature Request] Optuna experiment stream processing #2435

🚀 Feature Request

Motivation

Pitch

Additional context