Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Is there any benefit to having a vanilla REINFORCE algorithm for people trying to learn the concepts? REINFORCE with Baseline includes a value function approximator which has a lot of similarities to the Actor Critic.
I think being able to see a pure policy gradient method could be useful as a learning tool, otherwise people may assume Policy Gradient methods have to have some kind of value function approximation too.
Hello,
Is there any benefit to having a vanilla REINFORCE algorithm for people trying to learn the concepts? REINFORCE with Baseline includes a value function approximator which has a lot of similarities to the Actor Critic.
I think being able to see a pure policy gradient method could be useful as a learning tool, otherwise people may assume Policy Gradient methods have to have some kind of value function approximation too.