Closed zuzhaoye closed 4 years ago
Looks to me that the mistake is in the blog you shared. Check out the Double DQN paper. Specifically, this is the target according to the paper:
i.e. the target network is used to get the Q-value for the next state when acting greedily according to the primary network. You may also refer to this blog instead.
Closing issue now. Feel free to let me know if you need more help.
Hello, thank you so much for sharing this code structure! I got one thing no very sure about in your code.
https://github.com/Shivanshu-Gupta/Pytorch-Double-DQN/blob/1cff44d95d7881c6afc029b734508b1a705dfe14/agent.py#L94-L98
You obtain
next_state_actions
fromQnet
, but the following webpage suggests that we should getnext_state_actions
fromtarget_Qnet
. May I know if there is any theory support for your choice or it's simply an error you made? Very much appreciated.Website: https://towardsdatascience.com/double-deep-q-networks-905dd8325412