facebookresearch / hydra

Hydra is a framework for elegantly configuring complex applications
https://hydra.cc
MIT License
8.66k stars 623 forks source link

[Help requested] Optuna Sweeper Multiple GPU Parallelism #2892

Open gracikk-ds opened 5 months ago

gracikk-ds commented 5 months ago

Hello Hydra Team,

I am exploring the possibility of integrating Optuna Sweeper for hyperparameter tuning in a multiple processes setup using GridSearch. My objective is to utilize multiple GPUs on my machine to run experiments in parallel.

Based on the Optuna tutorial (https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/004_distributed.html), it is recommended to create storage, open multiple terminal windows and execute the processes concurrently. However, I've encountered a challenge with the Hydra grid sampler which seems to lack support for a 'seed' argument, leading to identical parameter combinations across all parallel runs.

Could you please advise on the following:

  1. Is there a recommended way to use Optuna Sweeper with Hydra in a multiple GPU setup to ensure diverse hyperparameter combinations across different processes?
  2. Are there any plans to support seed functionality in Hydra grid sampler, or is there an alternative approach to achieve the desired parallelism without parameter duplication?

Thank you for your assistance!

Here is my config:

# @package _global_

# PYTHONPATH=$(pwd) CUDA_VISIBLE_DEVICES=0 python src/cli/train.py hparams_search=optune_search

defaults:
  - override /hydra/sweeper: optuna
  - override /hydra/sweeper/sampler: grid
  - override /trainer: gpu

hyper_search: True

trainer:
  min_steps: 27000  # prevents early stopping
  max_steps: 27000  # steps to train for

# metric which will be optimized by Optuna
optimized_metric: "test/MR-mAP-Full_Avg"
database: "positions"
db_cred: "postgres"

# here we define Optuna hyperparameter search
# it optimizes for value returned from function with @hydra.main decorator
# docs: https://hydra.cc/docs/next/plugins/optuna_sweeper
hydra:
  mode: "MULTIRUN" # set hydra to multirun by default if this config is attached
  sweeper:
    _target_: hydra_plugins.hydra_optuna_sweeper.optuna_sweeper.OptunaSweeper
    storage: "postgresql://${db_cred}:${db_cred}@localhost/${database}"  # storage URL to persist optimization results
    study_name: "positions"  # name of the study to persist optimization results
    n_jobs: 1  # number of parallel workers
    direction: maximize  # 'minimize' or 'maximize' the objective
    n_trials: 6  # total number of runs that will be executed

    sampler:
      _target_: optuna.samplers.GridSampler

    # define hyperparameter search space
    params:
      model.model.pos_temp: choice(200, 400, 800, 1600, 3200, 6400, 12800, 25600)
      model.model.detector_pos_temp: choice(200, 400, 800, 1600, 3200, 6400, 12800, 25600)
CharlesAttend commented 1 month ago

Check that ! https://github.com/facebookresearch/hydra/issues/1974#issuecomment-1226185827