Closed Raymondliz closed 2 years ago
In car_rental_sunchronous.py function bellman(), the expected_return is cumulatively added though
expected_return += prob_ * (reward + self.gamma * values[num_of_cars_first_loc_, num_of_cars_second_loc_])
prob_ represents p(s'|s,a) which I believe nothing wrong about it. But reward is supposed to be multiplied with p(r|s,a) not p(s'|s,a) for right Bellman equation. So is it wrong?
prob_
p(s'|s,a)
reward
p(r|s,a)
I got the idea behind. Thx
In car_rental_sunchronous.py function bellman(), the expected_return is cumulatively added though
prob_
representsp(s'|s,a)
which I believe nothing wrong about it. Butreward
is supposed to be multiplied withp(r|s,a)
notp(s'|s,a)
for right Bellman equation. So is it wrong?