Update DP exercise policy evaluation solution

dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

http://www.wildml.com/2016/10/learning-reinforcement-learning/

MIT License

20.45k stars 6.02k forks source link

Update DP exercise policy evaluation solution #230

Closed ugrkm closed 4 years ago

ugrkm commented 4 years ago

According to the iterative policy evaluation algorithm, all of the next iterations must use the old values. But in the old version of the solution, the state-value function keeps updating itself.