Open devloper13 opened 4 years ago
When you call the reward update function, you send experience[-1]. But shouldn't it be experience[-2]. We are currently looking for S,A,R,S' from experience[-2] even while updating the state dynamics. We use experience[-1] only to get A'.
When you call the reward update function, you send experience[-1]. But shouldn't it be experience[-2]. We are currently looking for S,A,R,S' from experience[-2] even while updating the state dynamics. We use experience[-1] only to get A'.