Closed slinlee closed 2 years ago
The final reward values of each episode were being added cumulatively to the previous episodes.
Using mouse and cheese this is the bug:
Episode,reward_0_found_cheese 0,1 1,2 2,3 3,4 4,5 5,6 6,7 7,8 8,9 9,10
Episode,reward_0_found_cheese 0,1 1,1 2,1 3,1 4,1 5,1 6,1 7,1 8,1 9,1
The final reward values of each episode were being added cumulatively to the previous episodes.
Using mouse and cheese this is the bug:
Before
After