Inverted the meaning of epsilon in the Q-Learning algorithm

MorvanZhou / Reinforcement-learning-with-tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/

MIT License

8.92k stars 5.01k forks source link

Inverted the meaning of epsilon in the Q-Learning algorithm #151

Closed douglasrizzo closed 5 years ago

douglasrizzo commented 5 years ago

Sutton's book defines the e-greedy policy as such (pages 27-28, 2nd edition):

A simple alternative is to behave greedily most of the time, but every once in a while, say with small probability epsilon, instead select randomly from among all the actions with equal probability, independently of the action-value estimates.

The implementation of Q-Learning in this repository does the contrary, so I have fixed that.

douglasrizzo commented 5 years ago

I just realized epsilon is using in this inverted way in many places of the repo and I won't be fixing all of them, so it's best to close the PR.