epignatelli / reinforcement-learning-an-introduction

A python implementation of the concepts in the book "Reinforcement Learning: An Introduction" by R.S. Sutton and A. G. Barto.
http://incompleteideas.net/book/the-book-2nd.html
MIT License
18 stars 8 forks source link

Redundant discount factor #1

Open c-lyu opened 1 year ago

c-lyu commented 1 year ago

Issue Description: The reproduction code for the Gridworld environment, located here, appears to have an inconsistency regarding the implementation of the discount factor in the policy evaluation. According to Sutton's book, there is no mention of multiplying the value by a discount factor here.

Expected Behavior:

Input π
Initialize an array V(s) = 0, for all s ∈ S^+ 
Repeat
    ∆ ← 0
    For each s ∈ S:
        v ← V(s)
        V(s) ← ∑_a π(a | s) ∑_{s', r} p(s', r | s, a) [r + γ V(s')]
        ∆ ← max(∆, |v − V(s)|)
until ∆ < θ
Output V ≈ v_π
NathanZorndorf commented 11 months ago

I agree. The original author of this code base has an extraneous discount factor in the policy_evaluation function. The correct place of the discount factor is in the GridWorld.bellman_expectation function, and only in there.