DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
2.07k stars 515 forks source link

ZeroDivisionError: division by zero #238

Closed ahamza360 closed 2 years ago

ahamza360 commented 2 years ago

I am following this: https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/rl-baselines-zoo.ipynb#scrollTo=w2sC22eGHTH-

When I run python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median

It gives the following error:

========== MountainCar-v0 ========== Seed: 187784293 Default hyperparameters for environment (ones being tuned will be overridden): OrderedDict([('ent_coef', 0.0), ('gae_lambda', 0.98), ('gamma', 0.99), ('n_envs', 16), ('n_epochs', 4), ('n_steps', 16), ('n_timesteps', 1000000.0), ('normalize', True), ('policy', 'MlpPolicy')]) Using 16 environments Overwriting n_timesteps with n=50000 Doing 0 intermediate evaluations for pruning based on the number of timesteps. (1 evaluation every 100k timesteps) Normalization activated: {'gamma': 0.99} Optimizing hyperparameters /usr/local/lib/python3.7/dist-packages/optuna/samplers/_tpe/sampler.py:266: ExperimentalWarning: multivariate option is an experimental feature. The interface can change in the future. ExperimentalWarning, Sampler: tpe - Pruner: median [I 2022-04-21 15:18:02,699] A new study created in memory with name: no-name-64eb4a20-105a-4a16-8f45-a474918d2804 /usr/local/lib/python3.7/dist-packages/optuna/study/study.py:397: FutureWarning: n_jobs argument has been deprecated in v2.7.0. This feature will be removed in v4.0.0. See https://github.com/optuna/optuna/releases/tag/v2.7.0. FutureWarning, Normalization activated: {'gamma': 0.99} Normalization activated: {'gamma': 0.99} Normalization activated: {'gamma': 0.99, 'norm_reward': False} Normalization activated: {'gamma': 0.99, 'norm_reward': False} [W 2022-04-21 15:18:12,093] Trial 1 failed because of the following error: ZeroDivisionError('division by zero') Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 213, in _run_trial value_or_values = func(trial) File "/content/rl-baselines3-zoo/utils/exp_manager.py", line 664, in objective optuna_eval_freq = int(self.n_timesteps / self.n_evaluations) ZeroDivisionError: division by zero [W 2022-04-21 15:18:12,094] Trial 0 failed because of the following error: ZeroDivisionError('division by zero') Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 213, in _run_trial value_or_values = func(trial) File "/content/rl-baselines3-zoo/utils/exp_manager.py", line 664, in objective optuna_eval_freq = int(self.n_timesteps / self.n_evaluations) ZeroDivisionError: division by zero Traceback (most recent call last): File "train.py", line 236, in exp_manager.hyperparameters_optimization() File "/content/rl-baselines3-zoo/utils/exp_manager.py", line 750, in hyperparameters_optimization study.optimize(self.objective, n_trials=self.n_trials, n_jobs=self.n_jobs) File "/usr/local/lib/python3.7/dist-packages/optuna/study/study.py", line 409, in optimize show_progress_bar=show_progress_bar, File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 106, in _optimize f.result() File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result return self.get_result() File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential trial = _run_trial(study, func, catch) File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 264, in _run_trial raise func_err File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 213, in _run_trial value_or_values = func(trial) File "/content/rl-baselines3-zoo/utils/exp_manager.py", line 664, in objective optuna_eval_freq = int(self.n_timesteps / self.n_evaluations) ZeroDivisionError: division by zero

araffin commented 2 years ago

Hello, thanks for reporting the issue, it was introduced in e4bc2842fa8820e5040e066f1a4fa52a6cf2b7e0 (https://github.com/DLR-RM/rl-baselines3-zoo/pull/226) apparently. I will push a fix soon, you can do git checkout v1.5.0 in the meantime ;)

araffin commented 2 years ago

Now fixed ;)