Open JesseFarebro opened 4 years ago
I wouldn't expect this to be a big driver of performance, no. The ALE determinism has always been brittle at best -- going through saveState/loadState should provide a more robust way to reproducibility. Thanks for flagging this!
There was an issue raised (https://github.com/openai/gym/issues/1777) which describes differences between
v0.5.2
andv0.6.0
of the ALE. I traced some of the issues to this commit https://github.com/mgbellemare/Arcade-Learning-Environment/commit/7bff96b4b64edcffbeb2d9bb83b1685ab506ea2b#diff-d9d868097a7403416e6ef352d95dc4feR178 which changes howStellaEnvironment::softReset
works.The
RESET
action is calledm_num_reset
times which leads to a different starting state for the agent. Perhaps this was intended behaviour inStellaEnvironment::reset
but has ill-intended consequences inStellaEnvironment::softReset
.For example, here are the starting states for Ms. Pacman in ALE
v0.5.2
andv0.6.0
. Note if you emulate oneRESET
action then we get thev0.5.2
starting state.Ms. Pacman, ALE v0.5.2
Ms. Pacman, ALE v0.6.0
You can see the subtle changes between these two frames (e.g., the colour of ghosts in jail).
I haven't looked into why we repetitively call
RESET
. Should this be something that is investigated further? It wouldn't seem that this should affect asymptotic performance.