Farama-Foundation / Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.
GNU General Public License v2.0
2.14k stars 420 forks source link

unreliable initial state clone/restore in Qbert #412

Closed anadrome closed 3 years ago

anadrome commented 3 years ago

After noticing some wonkiness in tree-search code, I tried to cut it down to a minimal example, and got this reproducible case (works on both Linux and macOS for me, using both gcc and clang):

I tested all supported games, and this works as expected in all of them except Qbert, where the RAM the 2nd time through differs by a few bytes. I'm not quite sure where this is happening. Completely re-loading the ROM gives the expected results, so I don't think it's an nondeterminism issue like #246. Test program here: https://www.kmjn.org/temp/test_save_restore.cpp

JesseFarebro commented 3 years ago

Could you try with this PR applied: https://github.com/mgbellemare/Arcade-Learning-Environment/pull/401. I believe this should fix the issue, I haven't had time to write proper tests to merge this PR but it's on my list for the summer.

anadrome commented 3 years ago

I still get the same behavior with that PR applied. My guess is that #401 will fix some other tree-search wonkiness I was seeing though, so thanks for pointing me to it!

Here's a dump of the RAM sequences (in hex).

On initial load, RAM after 1 action and 2 actions:

0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

0094b8940094009402940000b000b000b000b000b0940000000000000000000094009400940094ff9400940041354135410000bd940094003500000000000000000000000000000000ff0000000000000094009400940094000000000707070707bf940094009409940105000000000094000000000d04040b05940000008db6

After restoreState(), taking the same sequence of actions:

0094b8940094009402940000b000b000b000b000b0940000000000000000000094009400940094ff9400940041354135410000bd940094003500000000000000000000000000000000ff0000000000000094009400940094000000000707070707bf940094009409940105000000000094000000000d04040b05940000008db6

0494b8940094009402940000b000b000b000b000b0940000000000000000000094439400940094ff9400940041354135410000bd940094003500000000000000000900000000000000ff0000000000000093009400940094000000000707070707bf940094009409940105000000000093ff0000000d04040b0493000000c9b4

So the state seen after 2 actions on initial load is seen after only 1 action after restoreState(). Looking at that, I think what's happening is that the very first action after loadROM() is eaten somehow: the RAM state after one action is all zeros, which doesn't seem right. So I think this might be something with initial setup rather than the clone/restore functionality?

JesseFarebro commented 3 years ago

Thanks for the script, I'll have a look at this in the next couple of days and get back to you. I have some ideas on what could be the issue but it'll take more investigation.

JesseFarebro commented 3 years ago

Thanks for reporting this @anadrome, I opened up a new issue #417 to track this.

anadrome commented 3 years ago

Did you mean to close pull request #413 instead? I think the weirdness with Qbert is something else (although i don't have a good guess what's causing it).

JesseFarebro commented 3 years ago

Hmm, I believe if we would apply #401 and perform processRAM() on restore (note: not performing processScreen()) it should result in the desired behaviour for the RAM buffer. If this doesn't solve the problem with Qbert I'll have to take a closer look to see what's going on.