ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction
MIT License
13.58k stars 4.82k forks source link

Some revision suggestions in Maximization_bias's Problem #86

Closed LBAWMY closed 6 years ago

LBAWMY commented 6 years ago

I just find that the picture generated by origin codes doesn't match the Figure 6.7 in the Sutton's book. This is the Figure 6.7 in Sutton's book: image However, the picture generate by origin codes: image I think the problem should be in the line 116,117, I am modifying it to the following code: left_counts_q = left_counts_q.mean(axis=0) left_counts_double_q = left_counts_double_q.mean(axis=0) This is the new picture generated by the revised code: image This output resembles the Figure 6.7 in Sutton's book.

Another suggestion (Following code used to replace the origin code in line 93): best_action = np.random.choice([action_ for action_, value_ in enumerate(active_q[next_state]) if value_ == np.max(active_q[next_state])])

ShangtongZhang commented 6 years ago

Would like to make a PR?