Get actions for next states by 'Qnet' or 'target_Qnet'?

zuzhaoye commented 4 years ago

Hello, thank you so much for sharing this code structure! I got one thing no very sure about in your code.

https://github.com/Shivanshu-Gupta/Pytorch-Double-DQN/blob/1cff44d95d7881c6afc029b734508b1a705dfe14/agent.py#L94-L98

You obtain next_state_actions from Qnet, but the following webpage suggests that we should get next_state_actions from target_Qnet. May I know if there is any theory support for your choice or it's simply an error you made? Very much appreciated.

Website: https://towardsdatascience.com/double-deep-q-networks-905dd8325412

Shivanshu-Gupta commented 4 years ago

Looks to me that the mistake is in the blog you shared. Check out the Double DQN paper. Specifically, this is the target according to the paper:

i.e. the target network is used to get the Q-value for the next state when acting greedily according to the primary network. You may also refer to this blog instead.

Shivanshu-Gupta commented 4 years ago

Closing issue now. Feel free to let me know if you need more help.

Shivanshu-Gupta / Pytorch-Double-DQN

Get actions for next states by 'Qnet' or 'target_Qnet'? #1