Shmuma / ptan

PyTorch Agent Net: reinforcement learning toolkit for pytorch
MIT License
531 stars 165 forks source link

dqn uses huberloss instead of mseloss #13

Open YiTanJang opened 6 years ago

YiTanJang commented 6 years ago

https://blog.openai.com/openai-baselines-dqn/ ... In the DQN Nature paper the authors write: “We also found it helpful to clip the error term from the update [...] to be between -1 and 1.“. There are two ways to interpret this statement — clip the objective, or clip the multiplicative term when computing gradient. The former seems more natural, but it causes the gradient to be zero on transitions with high error, which leads to suboptimal performance, as found in one DQN implementation. The latter is correct and has a simple mathematical interpretation — Huber Loss. You can spot bugs like these by checking that the gradients appear as you expect —...

i am reallly sorry for emitting so many issues but i really love the repo, thanks you

peter-peng-w commented 6 years ago

Correct! The same implementation of using Huber Loss and gradient clipping (using Module.parameters.grad.clamp_(-1,1) in PyTorch) could be seen in the PyTorch Tutorial's RL part.