Epistimio / orion

Asynchronous Distributed Hyperparameter Optimization.
https://orion.readthedocs.io
Other
287 stars 52 forks source link

fidelity_index doesn't support nested param #1125

Open FrancoisPgm opened 11 months ago

FrancoisPgm commented 11 months ago

Describe the bug I am runnig orion with the hydra plugin, and when I use a nested param of the config for the fidelity space for BOHB, e.g. hydra.sweeper.params.model.trainer.max_epochs: "fidelity(low=1, high=2)", the fidelity_index gets set as "model.trainer.max_epochs", but the trial.params dict keeps the nested structure :

{'model': {'params': {'lr': 0.0001783,
                      'lr_scheduler_args': {'T_max': 72312},
                      'weight_decay': 0.01001},
           'trainer': {'max_epochs': 1.0}}}

So I get :

  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 308, in has_suggested_all_possible_values
    fidelity_value = trial.params[fidelity_index]
KeyError: 'model.trainer.max_epochs'

Expected behavior I'd expect either the fidelity_index to keep the nested structure somehow, or the trial.params dict to get flattened keys, something like:

{
    'model.params.lr': 0.0001783,
    'model.params.lr_scheduler_args.T_max': 72312,
    'model.params.weight_decay': 0.01001,
    'model.trainer.max_epochs': 1.0
}

For now I can easily avoid the issue by using a non-nested param in my config file: hydra.sweeper.params.max_epochs: "fidelity(low=1, high=2)"

Steps to reproduce Define a fidelity dimension with a nested param.

Environment (please complete the following information):

Additional context The full error log :

[2023-12-05 08:13:00,956][HYDRA] Orion Optimizer {'type': 'bohb', 'config': {'seed': 1, 'min_points_in_model': 4, 'top_n_percent': 40, 'num_samples': 5}}
[2023-12-05 08:13:00,956][HYDRA] with parametrization {'model.params.lr': 'loguniform(1e-05, 0.01)', 'model.params.lr_scheduler_args.T_max': 'uniform(1000, 100000, discrete=True)', 'model.params.weight_decay': 'loguniform(0.01, 100)', 'model.trainer.max_epochs': 'fidelity(1, 2)'}
Traceback (most recent call last):
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 353, in clientctx
    yield client
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 510, in sweep
    raise e
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 507, in sweep
    self.optimize(self.client)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 525, in optimize
    trials = self.sample_trials()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 555, in sample_trials
    trials = self.suggest_trials(self.n_workers())
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 434, in suggest_trials
    trial = self.client.suggest(pool_size=count)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 563, in suggest
    if self.is_done:
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 167, in is_done
    return self._experiment.is_done
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/core/worker/experiment.py", line 541, in is_done
    self.algorithms.is_done and num_pending_trials == 0
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/core/worker/primary_algo.py", line 277, in is_done
    return super().is_done or self.algorithm.is_done
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 293, in is_done
    return self.has_completed_max_trials or self.has_suggested_all_possible_values()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 308, in has_suggested_all_possible_values
    fidelity_value = trial.params[fidelity_index]
KeyError: 'model.trainer.max_epochs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
    lambda: hydra.multirun(
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 162, in multirun
    ret = sweeper.sweep(arguments=task_overrides)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/orion_sweeper.py", line 79, in sweep
    return self.sweeper.sweep(arguments)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 510, in sweep
    raise e
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.9.6/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 355, in clientctx
    client.close()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 828, in close
    raise RuntimeError(
RuntimeError: There is still reserved trials: dict_keys(['7ba7eed37ff08c60dc9bad9341405be4'])
Release all trials before closing the client, using client.release(trial).