Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
According to the iterative policy evaluation algorithm, all of the next iterations must use the old values. But in the old version of the solution, the state-value function keeps updating itself.
According to the iterative policy evaluation algorithm, all of the next iterations must use the old values. But in the old version of the solution, the state-value function keeps updating itself.