awjuliani / successor_examples

Tutorials on learning and using successor representations.
MIT License
50 stars 14 forks source link

updating reward #1

Open devloper13 opened 4 years ago

devloper13 commented 4 years ago

When you call the reward update function, you send experience[-1]. But shouldn't it be experience[-2]. We are currently looking for S,A,R,S' from experience[-2] even while updating the state dynamics. We use experience[-1] only to get A'.