germain-hug / Deep-RL-Keras

Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)
533 stars 149 forks source link

DDQN.py function memorize: incorrect Q values ? #26

Closed surfingkaka closed 4 years ago

surfingkaka commented 4 years ago

It seems like if I compare from [https://arxiv.org/pdf/1511.05952.pdf](PER paper):

Algorithm 1: line 11 TD error delta(j) = Reward(j) + gamma(j) * Q_target(S_j, arg max_a Q(S_j, a)) - Q(S_j-1, A_j-1)

If I am not mistaken the j-1 subscript refers to current state in the implementation, i.e. state, action, reward, done all refer to j-1 . And new_state refers to j

Then line 125 in ddqn.py refers to arg max of itself not to the previous one: q_val = self.agent.predict(state) next_best_action = np.argmax(q_val)

should be

q_val = self.agent.predict(new_state) next_best_action = np.argmax(q_val)