Kaixhin / ACER

Actor-critic with experience replay
MIT License
251 stars 46 forks source link

Detach z star p. #13

Closed random-user-x closed 6 years ago

random-user-x commented 6 years ago

@Kaixhin I think it is better to detach z_star_p. Please let me know how you feel about this.

random-user-x commented 6 years ago

@Kaixhin , I think you have approximated the trust region update in the wrong way. The present implementation might not the one which is discussed in the paper. Would you like me to open a PR on the correct implementation?

Kaixhin commented 6 years ago

Yes for certain the update should be done between the softmax and the input to the softmax, rather than on the parameters of the policy head, so it's not following the paper at the moment. If you've got the correct implementation then please open a PR.