coreylynch / async-rl

Tensorflow + Keras + OpenAI Gym implementation of 1-step Q Learning from "Asynchronous Methods for Deep Reinforcement Learning"
MIT License
1.01k stars 174 forks source link

Stop actor gradient flowing through the critic #8

Open danijar opened 8 years ago

danijar commented 8 years ago

I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.

coreylynch commented 8 years ago

Oh interesting! I will definitely take a look at this. Thank you.

steveKapturowski commented 8 years ago

I noticed this as well and believe it's a significant cause of performance degradation. Additionally, you don't seem to be adding the entropy term to the objective which they mention in the paper as being useful for improving exploration.