Unstable reinforce with baseline model

dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

http://www.wildml.com/2016/10/learning-reinforcement-learning/

MIT License

20.57k stars 6.04k forks source link

Unstable reinforce with baseline model #192

Open Jacobi93 opened 5 years ago

Jacobi93 commented 5 years ago

Hi, thank you for your wonderful codes. It helps me a lot. In the REINFORCE with baseline for cliff_walking, I could not obtain stable results. The best reward should be -15 as you plotted. But sometimes when I run the code without any change, it converges to -100, which is very weird. Could anyone run the code for several times and find out why is that? Thank you so much.

JaySiu commented 5 years ago

Same here, the algorithm couldn't converge as the example does. But off-policy Q-learning with linear function approximation does not guarantee convergence, according to David Silver's lecture notes 6 page 32. It is interesting that how the original example gets converged.
My result:
download

Jacobi93 commented 5 years ago

Do not guarantee means that it may converge, is not guaranteed. Different initializers and random policies may lead to different results. but maybe it is better for the author to mention it. Thanks.