ch06 random_walk td method

ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction

MIT License

13.54k stars 4.82k forks source link

ch06 random_walk td method #157

Open Perseus1993 opened 1 year ago

Perseus1993 commented 1 year ago

why reward = 0 in all state?

should be reward = 1 if cur_state == 6 else 0

kevroi commented 1 year ago

Since we get a reward for reaching the right terminal state, we can set its value to be 1 rather than having a reward function where reward = 1 if cur_state == 6. This trick of setting all rewards to 0 was used in the Gamblers Problem (Example 4.3) too.