datawhalechina / easy-rl

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
Other
9.04k stars 1.81k forks source link

distinguish Qnet and set detach() on TD target #117

Closed tlt18 closed 1 year ago

tlt18 commented 1 year ago

I found two problems with the Double DQN code.

  1. TD target does not have detach(), which is equivalent to not using semi-gradient method;
  2. During debugging, I found that the outputs of policy_net and target_net are the same, which is because no they maintain the same network, and target_net changes immediately after policy_net changes.
johnjim0816 commented 1 year ago

Thanks for your PR, I will consider you suggestions. But now we are trying to update new template for all algos, thus I cannot merge your PR now. I will add acknowledge of you when update Double DQN