Kaixhin / ACER

Actor-critic with experience replay

MIT License

251 stars 46 forks source link

Trust Region Updates #12

Closed random-user-x closed 6 years ago

random-user-x commented 6 years ago

Hello @Kaixhin, https://github.com/Kaixhin/ACER/blob/f22b07cebd9ec278c5b604b2652e6657df4b61ab/train.py#L97 I think that we should freeze the value of z_star_p by using z_star_p.detach().

In the second stage, we take advantage of back-propagation. Specifically, the updated gradient with
respect to φθ, that is z_∗, is back-propagated through the network to compute the derivatives with
respect to the parameters.

Please let me know what do you think.

random-user-x commented 6 years ago

13

Kaixhin commented 6 years ago

Ah yes the gradients are probably leaking through z_star_p, I think you are right on this one.