Closed cgel closed 7 years ago
aren't you clipping the gradient twice now (if max_grad_norm is set, as you don't check anymore if min/max_delta is set)?
self.clipped_error = tf.where(tf.abs(self.delta) < 1.0,
0.5 * tf.square(self.delta),
tf.abs(self.delta) - 0.5, name='clipped_error')
and
grads_and_vars[idx] = (tf.clip_by_norm(grad, self.max_grad_norm), var)
(whereas the first one should be equivalent to using:
grads_and_vars[idx] = (tf.clip_by_value(grad, -1, 1), var)
)
It seems to me that in all of the agents you are clipping the gradient. This would mean that the gradients are zero for large errors. It might be because the paper "Human-level control through deep reinforcement learning" makes a mistake when talking about clipping the loss. What they actually do in the implementation is:
abs(loss) for abs(loss) > 1
andloss^2 for abs(loss)<1
. This can be implemented like this: