dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
http://www.wildml.com/2016/10/learning-reinforcement-learning/
MIT License
20.45k stars 6.02k forks source link

Provided policy_improvement() solution is not guaranteed to terminate #203

Open link2xt opened 5 years ago

link2xt commented 5 years ago

To set policy_stable variable, provided code checks whether the policy is changed. If there are multiple optimal policies, the policy may change infinitely even though optimal policy is already found.

See Exercise 4.4 of the 2018 edition in Sutton & Barto book, it explicitly points out this bug in the pseudocode.

link2xt commented 5 years ago

Also see related issue #202 about naming of the function.