Kaixhin / ACER

Actor-critic with experience replay
MIT License
251 stars 46 forks source link

Changed trust region settings. Correct implementation. #15

Closed random-user-x closed 6 years ago

random-user-x commented 6 years ago

Paper implementation of trust region updates.

random-user-x commented 6 years ago

This is the correct implementation according to me. The variance has reduced as discussed in the paper. I haven't played with hyperparameters yet. Just a random run gives this result(variance is evaluated every 5000 steps.) newplot 6

I haven't used the --lr-decay. I think using that will make learning smoother.

Kaixhin commented 6 years ago

Great, thank you very much.