Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Hi, I've recently been working on the function approximation exercises. Q-learning (I also tried sarsa as well) algorithm with FA runs ok for the default 100 episodes, but for 1000+ episodes it frequently gets stuck at -200 reward for quite long, usually until the end of the training. In other cases however(separate instance of the algorithm), the training progresses without getting stuck at -200. This was the case for the solution version as well. I was wondering if this behavior is something to be expected (maybe due to the fact that we are using SGD?), or there's something actually wrong in the code.
Hi, I've recently been working on the function approximation exercises. Q-learning (I also tried sarsa as well) algorithm with FA runs ok for the default 100 episodes, but for 1000+ episodes it frequently gets stuck at -200 reward for quite long, usually until the end of the training. In other cases however(separate instance of the algorithm), the training progresses without getting stuck at -200. This was the case for the solution version as well. I was wondering if this behavior is something to be expected (maybe due to the fact that we are using SGD?), or there's something actually wrong in the code.