Shivanshu-Gupta / Pytorch-Double-DQN

Pytorch Implementation of Double DQN Algorithm
9 stars 0 forks source link

Get actions for next states by 'Qnet' or 'target_Qnet'? #1

Closed zuzhaoye closed 4 years ago

zuzhaoye commented 4 years ago

Hello, thank you so much for sharing this code structure! I got one thing no very sure about in your code.

You obtain next_state_actions from Qnet, but the following webpage suggests that we should get next_state_actions from target_Qnet. May I know if there is any theory support for your choice or it's simply an error you made? Very much appreciated.


Shivanshu-Gupta commented 4 years ago

Looks to me that the mistake is in the blog you shared. Check out the Double DQN paper. Specifically, this is the target according to the paper:


i.e. the target network is used to get the Q-value for the next state when acting greedily according to the primary network. You may also refer to this blog instead.

Shivanshu-Gupta commented 4 years ago

Closing issue now. Feel free to let me know if you need more help.