ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction
MIT License
13.45k stars 4.81k forks source link

chap1/tic_tac_toc.py why does make td_error zero when exploring #125

Closed GarfieldF closed 4 years ago

GarfieldF commented 4 years ago

td_error = self.greedy[i] * ( self.estimations[states[i + 1]] - self.estimations[state] )

when exloring greedy is false

if np.random.rand() < self.epsilon: action = next_positions[np.random.randint(len(next_positions))] action.append(self.symbol) self.greedy[-1] = False This operation would make exploration meaningless, wouldn't it?

ShangtongZhang commented 4 years ago

No. Because the td errors for the transitions after the exploration step is not zero.