https://blog.openai.com/openai-baselines-dqn/... In the DQN Nature paper the authors write: “We also found it helpful to clip the error term from the update [...] to be between -1 and 1.“. There are two ways to interpret this statement — clip the objective, or clip the multiplicative term when computing gradient. The former seems more natural, but it causes the gradient to be zero on transitions with high error, which leads to suboptimal performance, as found in one DQN implementation. The latter is correct and has a simple mathematical interpretation — Huber Loss. You can spot bugs like these by checking that the gradients appear as you expect —...
i am reallly sorry for emitting so many issues but i really love the repo, thanks you
Correct! The same implementation of using Huber Loss and gradient clipping (using Module.parameters.grad.clamp_(-1,1) in PyTorch) could be seen in the PyTorch Tutorial's RL part.
https://blog.openai.com/openai-baselines-dqn/ ... In the DQN Nature paper the authors write: “We also found it helpful to clip the error term from the update [...] to be between -1 and 1.“. There are two ways to interpret this statement — clip the objective, or clip the multiplicative term when computing gradient. The former seems more natural, but it causes the gradient to be zero on transitions with high error, which leads to suboptimal performance, as found in one DQN implementation. The latter is correct and has a simple mathematical interpretation — Huber Loss. You can spot bugs like these by checking that the gradients appear as you expect —...
i am reallly sorry for emitting so many issues but i really love the repo, thanks you