JuliaReinforcementLearning / ReinforcementLearningAnIntroduction.jl

Julia code for the book Reinforcement Learning An Introduction
https://juliareinforcementlearning.org/ReinforcementLearningAnIntroduction.jl/
MIT License
309 stars 58 forks source link

TD time step parameter #87

Open baedan opened 2 years ago

baedan commented 2 years ago

currently multi-step TD has an incorrect parameter (JuliaReinforcementLearning/ReinforcementLearning.jl#648).
https://github.com/JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl/blob/e83f54055d621dbc44b205d2016c6868abf4b4a1/notebooks/Chapter09_Random_Walk.jl#L193-L216

as an example, the n is used as the number of time steps. however it currently corresponds to the number of time steps plus one. run_once(1, α) thus is not TD(0) which has a time step parameter of 1, but rather a 2-step TD method. depending on how upstream is resolved an update might be needed here.