dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
http://www.wildml.com/2016/10/learning-reinforcement-learning/
MIT License
20.45k stars 6.02k forks source link

policy_improvement() should be renamed to policy_iteration() #202

Open link2xt opened 5 years ago

link2xt commented 5 years ago

In the DP directory there is a Policy Iteration.ipynb. It contains function policy_improvement() which returns optimal policy and its value function. In the book this algorithm is called "Policy Iteration" (see p.80), while policy improvement is just a 3rd step inside of it.