To my best knowledge, since you are using MSE as criteria, so is the self.target[action_index] value supposed to be just:
self.target[action_index] = reward + self.discountFactor * max_q ?
I am not quite sure in this line:
1) here we have a minus pred[action_index], isn't it supposed to be done inside MSE criteria?
2) introduce a self.learningRate here, in my opinion learning rate is useful for updating weights which is already used in updateParameters(self.learningRate) as in your code.
I just have a question on this line in your code:
self.target[action_index] = pred[action_index] + self.learningRate * (reward + self.discountFactor * max_q - pred[action_index])
(https://github.com/Mononofu/reinforcement-learning/blob/master/lua/nnlearner.lua#L81 )To my best knowledge, since you are using MSE as criteria, so is the
self.target[action_index]
value supposed to be just:self.target[action_index] = reward + self.discountFactor * max_q
?as in this line done by someone else?: https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L359
I am not quite sure in this line: 1) here we have a minus
pred[action_index]
, isn't it supposed to be done inside MSE criteria? 2) introduce aself.learningRate
here, in my opinion learning rate is useful for updating weights which is already used inupdateParameters(self.learningRate)
as in your code.Thank you.