TRPO "underflow encountered in multiply"

jarlva commented 4 years ago

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following: Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

Using the recent version, 2.9.0, Python 3.7.5.

araffin commented 4 years ago

Hello, Please fill the issue template completely.

jarlva commented 4 years ago

Training a custom Gym env with TRPO. After some time (random - anywhere from 30sec to 3 min) it kicks with the following traceback. The error occurs only with TRPO. Using same code/environment/gym with another RL strategy completes successfully. Tried the code below on CartPole-v1. Yet it does not cause an error (maybe because it's an easy one).

Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

In the begining it seems the code starts fine. Yet, at some points it goes into "silent loop", without any updates on the console, as if as it's frozen. The only way to reveal and force it to spit the error is by adding to the top of stable-baselines\stable_baselines\trpo_mpi\utils.py, after the line "import numpy as np", the following: np.seterr(all='raise')

Code example from stable_baselines import TRPO #DQN, PPO2, A2C, ACKTR, import tensorflow.compat.v1.logging as tflogging ; tflogging.set_verbosity(tflogging.ERROR) # supress tf warnings

import gym, import numpy as np np.seterr(all='raise')

env = gym.make('Myrl-v0') model = TRPO('MlpPolicy', env, verbose=0) model.learn(total_timesteps=900000)

System Info Using the recent version, 2.9.0, Python 3.7.5. Windows 10 TF 1.15 no GPU installed via git and then "pip install -e ."

araffin / rl-baselines-zoo

TRPO "underflow encountered in multiply" #59