Open Perseus1993 opened 1 year ago
Since we get a reward for reaching the right terminal state, we can set its value to be 1 rather than having a reward function where reward = 1 if cur_state == 6
. This trick of setting all rewards to 0 was used in the Gamblers Problem (Example 4.3) too.
why reward = 0 in all state?
should be reward = 1 if cur_state == 6 else 0