Wrong Bellman equation for Jack's car rental problem?

ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction

MIT License

13.45k stars 4.81k forks source link

Wrong Bellman equation for Jack's car rental problem? #154

Closed Raymondliz closed 2 years ago

Raymondliz commented 2 years ago

In car_rental_sunchronous.py function bellman(), the expected_return is cumulatively added though

expected_return += prob_ * (reward + self.gamma * values[num_of_cars_first_loc_, num_of_cars_second_loc_])

prob_ represents p(s'|s,a) which I believe nothing wrong about it. But reward is supposed to be multiplied with p(r|s,a) not p(s'|s,a) for right Bellman equation. So is it wrong?

Raymondliz commented 2 years ago

I got the idea behind. Thx