Mononofu / reinforcement-learning

Implementing exercises from Reinforcement Learning: An Introduction
64 stars 15 forks source link

Could you please enlighten me on this line? Thanks! #1

Closed hohoCode closed 8 years ago

hohoCode commented 9 years ago

I just have a question on this line in your code: self.target[action_index] = pred[action_index] + self.learningRate * (reward + self.discountFactor * max_q - pred[action_index]) (https://github.com/Mononofu/reinforcement-learning/blob/master/lua/nnlearner.lua#L81 )

To my best knowledge, since you are using MSE as criteria, so is the self.target[action_index] value supposed to be just: self.target[action_index] = reward + self.discountFactor * max_q ?

as in this line done by someone else?: https://github.com/blakeMilner/DeepQLearning/blob/master/deepqlearn.lua#L359

I am not quite sure in this line: 1) here we have a minus pred[action_index], isn't it supposed to be done inside MSE criteria? 2) introduce a self.learningRate here, in my opinion learning rate is useful for updating weights which is already used in updateParameters(self.learningRate) as in your code.

Thank you.

hohoCode commented 8 years ago

I think I figured out the problem by myself. Thanks for your opensource codes.