carpedm20 / deep-rl-tensorflow

TensorFlow implementation of Deep Reinforcement Learning papers
MIT License
1.6k stars 396 forks source link

clipping the delta zeros gradients #5

Closed cgel closed 7 years ago

cgel commented 8 years ago

It seems to me that in all of the agents you are clipping the gradient. This would mean that the gradients are zero for large errors. It might be because the paper "Human-level control through deep reinforcement learning" makes a mistake when talking about clipping the loss. What they actually do in the implementation is: abs(loss) for abs(loss) > 1 and loss^2 for abs(loss)<1. This can be implemented like this:

delta_grad_clip = 1                                                                                       
batch_delta = Y - DQN_acted                                                                               
batch_delta_abs = tf.abs(batch_delta)                                                                     
batch_delta_quadratic = tf.minimum(batch_delta_abs, delta_grad_clip)                                      
batch_delta_linear = batch_delta_abs - batch_delta_quadratic                                              
batch_loss = batch_delta_linear + batch_delta_quadratic**2                                                
loss = tf.reduce_mean(batch_loss)        
abred commented 7 years ago

aren't you clipping the gradient twice now (if max_grad_norm is set, as you don't check anymore if min/max_delta is set)?

self.clipped_error = tf.where(tf.abs(self.delta) < 1.0,
                               0.5 * tf.square(self.delta),
                               tf.abs(self.delta) - 0.5, name='clipped_error')

and

grads_and_vars[idx] = (tf.clip_by_norm(grad, self.max_grad_norm), var)

(whereas the first one should be equivalent to using:

grads_and_vars[idx] = (tf.clip_by_value(grad, -1, 1), var)

)