dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
http://www.wildml.com/2016/10/learning-reinforcement-learning/
MIT License
20.57k stars 6.04k forks source link

Question about linear value function approximation exercises #100

Open ronaldseoh opened 7 years ago

ronaldseoh commented 7 years ago

Hi, I've recently been working on the function approximation exercises. Q-learning (I also tried sarsa as well) algorithm with FA runs ok for the default 100 episodes, but for 1000+ episodes it frequently gets stuck at -200 reward for quite long, usually until the end of the training. In other cases however(separate instance of the algorithm), the training progresses without getting stuck at -200. This was the case for the solution version as well. I was wondering if this behavior is something to be expected (maybe due to the fact that we are using SGD?), or there's something actually wrong in the code.

sarsa_1000_warm_02 stuck_01 sarsa_1000_02

anhtran1995 commented 6 years ago

I got similar results. Did you figure out if there is something wrong with the code?

DylanHaiyangChen commented 6 years ago

I also got the same results. I think it may be due to the different version of the package, such as scikit-learn.