germain-hug / Deep-RL-Keras

Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)
528 stars 149 forks source link

Mistake in prioritised replay? #4

Closed Khev closed 5 years ago

Khev commented 5 years ago

Hello again,

FYI: think you might have defined the TD error wrong in the "Deep-RL-Keras/DDQN/ddqn.py". On line 125 you have

""" q_val = self.agent.predict(new_state) ## I think the argument should be 'state' here q_val_t = self.agent.target_predict(new_state) next_best_action = np.argmax(q_val) new_val = reward + self.gamma * q_val_t[0, next_best_action] td_error = abs(new_val - q_val)[0] """

But I think the correct definition is

td_error = abs( Q(s,a) - yi ) with yi = ri + gamma*max( Q(s', a') )

germain-hug commented 5 years ago

Apologies for the late response! I believe you are correct, that was a typo indeed, just fixed the issue thanks for reporting it

YidingYu commented 5 years ago

I think there is still a problem with the latest code:

        if(self.with_per):
            q_val = self.agent.predict(state)
            q_val_t = self.agent.target_predict(new_state)
            next_best_action = np.argmax(q_val)
            new_val = reward + self.gamma * q_val_t[0, next_best_action]
            td_error = abs(new_val - q_val)[0]

"next_best_action = np.argmax(q_val)" should be "next_best_action = np.argmax(self.agent.predict(new_state))".

Xiaoyu006 commented 4 years ago

I think there is still a problem with the latest code:

        if(self.with_per):
            q_val = self.agent.predict(state)
            q_val_t = self.agent.target_predict(new_state)
            next_best_action = np.argmax(q_val)
            new_val = reward + self.gamma * q_val_t[0, next_best_action]
            td_error = abs(new_val - q_val)[0]

"next_best_action = np.argmax(q_val)" should be "next_best_action = np.argmax(self.agent.predict(new_state))".

I think you are right.