Closed caojilin closed 3 years ago
Thanks for bringing up the issue, I just looked into it now and have resolved the issue.
The issue was in the SARSA implementation. In the training loop when the reward value was overwritten after calling env.step() the value did not persist. Thus, the reward was always zero and hence all states-action pairs were assigned a value of zero.
After run FrozenLake_v0.py, the agent didn't learn anything. I'm wondering which part is wrong.