Could you please enlighten me on this line? Thanks!

I just have a question on this line in your code: self.target[action_index] = pred[action_index] + self.learningRate * (reward + self.discountFactor * max_q - pred[action_index]) (https://github.com/Mononofu/reinforcement-learning/blob/master/lua/nnlearner.lua#L81 )

To my best knowledge, since you are using MSE as criteria, so is the self.target[action_index] value supposed to be just: self.target[action_index] = reward + self.discountFactor * max_q ?

as in this line done by someone else?: https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L359

I am not quite sure in this line: 1) here we have a minus pred[action_index], isn't it supposed to be done inside MSE criteria? 2) introduce a self.learningRate here, in my opinion learning rate is useful for updating weights which is already used in updateParameters(self.learningRate) as in your code.

Thank you.

Mononofu / reinforcement-learning

Could you please enlighten me on this line? Thanks! #1