Some revision suggestions in Maximization_bias's Problem

I just find that the picture generated by origin codes doesn't match the Figure 6.7 in the Sutton's book. This is the Figure 6.7 in Sutton's book: However, the picture generate by origin codes: I think the problem should be in the line 116,117, I am modifying it to the following code: left_counts_q = left_counts_q.mean(axis=0) left_counts_double_q = left_counts_double_q.mean(axis=0) This is the new picture generated by the revised code: This output resembles the Figure 6.7 in Sutton's book.

Another suggestion (Following code used to replace the origin code in line 93): best_action = np.random.choice([action_ for action_, value_ in enumerate(active_q[next_state]) if value_ == np.max(active_q[next_state])])

ShangtongZhang / reinforcement-learning-an-introduction

Some revision suggestions in Maximization_bias's Problem #86