Open ujjawalchugh97 opened 4 years ago
Hello,
Can you give a little more context on how to reproduce the problem? A code segment would be nice, or do you have this problem running one coach's prebuilt presets?
Shadi
It Happened to me as well.. the logs start showing this 2020-06-22-22:13:34.169917 Policy training - Surrogate loss: nan KL divergence: nan Entropy: nan training epoch: 2 learning_rate: 0.0003
and both agents throw an exception : File "/lib/python3.7/site-packages/rl_coach/exploration_policies/categorical.py", line 48, in get_action action = np.random.choice(self.action_space.actions, p=action_values) File "mtrand.pyx", line 793, in numpy.random.mtrand.RandomState.choice ValueError: probabilities contain NaN
Changing the hyperparams does not fix it , might postpone the problem. Think Clipped PPO is even more sensitive to this problem.
Edit: I made some quick tests and noticed that changing the activation function makes a lot of difference on when the exception occurs. Tried 'relu', 'tanh', 'selu', 'leaky_relu' and the most "stable" seems to be tanh.
Take a look at my suggestion on https://github.com/IntelLabs/coach/issues/87
Hi, I have tried applying Clipped PPO agents with different environments and after some time Surrogate Loss, KL divergence and entropy all become Nan. I've tried various settings of hyperparameters, it sometimes postpones the crash but this issue is still prevalent. I've faced similar issue with PPO as well.
Many users have run into similar problem ( for eg: Issue #87). Kindly suggest any solution to this problem.