A3C Doom: Basic scenario: How to select clipping?

awjuliani / DeepRL-Agents

A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

MIT License

2.24k stars 826 forks source link

A3C Doom: Basic scenario: How to select clipping? #20

Closed IbrahimSobh closed 7 years ago

IbrahimSobh commented 7 years ago

Why 40.0?

grads,self.grad_norms = tf.clip_by_global_norm(self.gradients,40.0)

awjuliani commented 7 years ago

Hi Ibrahim,

This is something that should be adjusted based on the performance findings of your own task. 40 is what was used in the OpenAI starter agent, so I used that here, as it led to convergence for the example task.

DMTSource commented 7 years ago

Would it be wise to use the Grad Norm plot from this projects Tensorboard to select a value that is close to the mean-max of the smoothed chart to prevent overly large updates? Along those lines, is it worth considering an adaptive gradient norm clip derived from a moving average of the norm?

IbrahimSobh commented 7 years ago

Thank you @DMTSource for elaboration

Could you please give numerical examples? (for better understanding)

If Grad Norm is around 25, then we should set clipping = 25? When Grad Norm is decrease by time (say 20), then we should set clipping = 20?

DMTSource commented 7 years ago

Yes that is what I am thinking. Looking at your png here, my question above asks if setting the clip value to between ~15-20 is appropriate. Or in a dynamic sense, to recreate the smoothed values seen in Tensorboard(careful they are smoothed values of smoothed values) and use n_std*std_of_norms_rolling + mean_of_norms_rolling to determine an upper bound that does not throw away information.

But back to my above question for @awjuliani : is visual inspection of the Grad Norm plot even a valid way to determine the grad norm clip value or will the gradient updates change magnitude with each hyperparamter update?

IbrahimSobh commented 7 years ago

I hope this would solve the gathering health scenario (till now I failed to make it converge)

waiting for @awjuliani ...