Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
In the DP directory there is a Policy Iteration.ipynb. It contains function policy_improvement() which returns optimal policy and its value function. In the book this algorithm is called "Policy Iteration" (see p.80), while policy improvement is just a 3rd step inside of it.
In the
DP
directory there is aPolicy Iteration.ipynb
. It contains functionpolicy_improvement()
which returns optimal policy and its value function. In the book this algorithm is called "Policy Iteration" (see p.80), while policy improvement is just a 3rd step inside of it.