IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

Nans in PPO and Clipped PPO agents #418

Open ujjawalchugh97 opened 4 years ago

ujjawalchugh97 commented 4 years ago

Hi, I have tried applying Clipped PPO agents with different environments and after some time Surrogate Loss, KL divergence and entropy all become Nan. I've tried various settings of hyperparameters, it sometimes postpones the crash but this issue is still prevalent. I've faced similar issue with PPO as well.

Many users have run into similar problem ( for eg: Issue #87). Kindly suggest any solution to this problem.

shadiendrawis commented 4 years ago

Hello,

Can you give a little more context on how to reproduce the problem? A code segment would be nice, or do you have this problem running one coach's prebuilt presets?

Shadi

ghidellar commented 4 years ago

It Happened to me as well.. the logs start showing this 2020-06-22-22:13:34.169917 Policy training - Surrogate loss: nan KL divergence: nan Entropy: nan training epoch: 2 learning_rate: 0.0003

and both agents throw an exception : File "/lib/python3.7/site-packages/rl_coach/exploration_policies/categorical.py", line 48, in get_action action = np.random.choice(self.action_space.actions, p=action_values) File "mtrand.pyx", line 793, in numpy.random.mtrand.RandomState.choice ValueError: probabilities contain NaN

Changing the hyperparams does not fix it , might postpone the problem. Think Clipped PPO is even more sensitive to this problem.

Edit: I made some quick tests and noticed that changing the activation function makes a lot of difference on when the exception occurs. Tried 'relu', 'tanh', 'selu', 'leaky_relu' and the most "stable" seems to be tanh.

eflopez1 commented 3 years ago

Take a look at my suggestion on https://github.com/IntelLabs/coach/issues/87