Farama-Foundation / Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.
GNU General Public License v2.0
2.12k stars 420 forks source link

Space invaders terminate unexpectedly #408

Closed willtop closed 3 years ago

willtop commented 3 years ago

I have encountered this problem frequently using ALE to play space invaders and couldn't find out any error in my code. Essentially, when the agent plays the game accumulating a certain level of game scores (above 1000 generally), the game would often just terminate, without any error msg, but reported ale.game_over()=True and ale.lives()=0, even when the actor is not hit by any bullet, and have more than 1 life remaining. The number of frames when such unexpected termination happens is often around 1000 (sometimes even less), so definitely not at the upperlimit of max number of frames (nor anywhere close to 5 minutes game playing time).

I have tried the ALE class installed from source code of this repo, as well as the ALE class from atari_py package https://github.com/openai/atari-py. I have also tried over ROMs from several different sources. This error happens for every such combination.

Just want to ask if this is a previously encountered problem. If so, how should I verify the ALE class and ROM file I am using is not the cause of error (for some combinations there are warnings of mismatch MD5 sum, while other combinations there isn't any such warning)?

If this doesn't sound like a problem to the ALE class and ROM file, then I can relieve myself from those doubts, and just checking closely into my code (although I have checked and currently can't find anything in my code that would cause this).

Thanks in advance!

JesseFarebro commented 3 years ago

Hi @willtop, if the issue is frequent would it be possible to record the action sequence? When recording the action sequence just make sure the environment seed is fixed, i.e., ale.setInt('random_seed', 1). Could you also post the md5 of the ROM you're using?

I have personally never had any issues with Space Invaders so let's see if we can isolate this to the environment or not.

willtop commented 3 years ago

Hello Jesse, thanks again for your prompt help! I checked the last 20 actions performed prior to termination, and saw no particular patterns.

For the md5 sums for the ROM files I've tried (the performance of the agent on latter 2 are noticeably worse, likely due to color theme change in the game objects): 72ffbef6504b75e69ee1045af9075f66 (which is what ALE compiled from this repository expected based on the log info it automatically prints) 61dbe94f110f30ca4ec524ae5ce2d026 f1b7edff81ceef5af7ae1fa76c8590fc 15b9f5e2439bfaa08874b5184261c777

JesseFarebro commented 3 years ago

So is it possible to record all the actions from the beginning of the episode and post them here? This way, I might be able to reproduce the problem. Just make sure to fix the random seed before recording the action sequence.

Thanks for the MD5 info; I don't think this will be the root cause. I'd continue to use 72ffbef6504b75e69ee1045af9075f66. If possible, make sure this is the MD5 you record the action sequence on.

willtop commented 3 years ago

Thanks for the ROM MD5 suggest! As this doesn't seem to be a well-known problem, I am guessing it's something in my code. Since this happens rather frequently for me and the sequence of actions change drastically, it doesn't seem to be action sensitive, but I understand that being able to reproduce the results is a good way for debugging.

Here is one sequence of actions that triggered the game to terminate directly (essentially jump directly from having 3 lives to 0 life remaining): [1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 0, 1, 1, 1, 1, 5, 2, 1, 5, 2, 1, 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 5, 1, 1, 1, 4, 2, 5, 5, 5, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 4, 1, 1, 5, 0, 0, 4, 0, 0, 4, 5, 5, 5, 5, 4, 4, 5, 4, 4, 0, 5, 0, 1, 1, 1, 1, 1, 1, 1, 1, 4, 0, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 2, 5, 1, 4, 4, 1, 1, 4, 0, 1, 1, 1, 4, 4, 4, 4, 4, 1, 1, 2, 1, 1, 1, 4, 1, 1, 1, 1, 1, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1, 1, 3, 1, 2, 1, 1, 0, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 4, 2, 2, 2, 2, 5, 5, 5, 2, 5, 1, 1, 2, 2, 2, 4, 4, 4, 4, 4, 4, 2, 2, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 2, 2, 4, 4, 2, 5, 2, 5, 2, 2, 2, 5, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 2, 2, 5, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 5, 5, 5, 5, 5, 5, 0, 5, 5, 5, 0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 2, 1, 1, 4, 4, 1, 4, 4, 4, 4, 0, 4, 4, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 4, 1, 2, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 2, 2, 3, 2, 2, 3, 5, 5, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 3, 2, 4, 4, 4, 1, 5, 2, 1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 3, 3, 2, 2, 2, 2, 5, 5, 1, 1, 5, 5, 5, 5, 5, 5, 1, 5, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 5, 2, 1, 5, 1, 5, 5, 5, 5, 1, 1, 5, 1, 1, 1, 5, 1, 2, 5, 5, 1, 1, 5, 5, 5, 5, 1, 1, 5, 4, 1, 3, 1, 5, 5, 5, 4, 4, 5, 1, 0, 4, 3, 5, 1, 1, 4, 4, 5, 5, 4, 4, 0, 4, 5, 0, 5, 5, 4, 4, 1, 5, 4, 4, 1, 4, 4, 1, 5, 1, 4, 1, 5, 1, 1, 1, 1, 1, 3, 3, 0, 3, 5, 5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 1, 2, 5, 5, 2, 5, 2, 5, 5, 5, 5, 4, 2, 4, 0, 4, 1, 5, 5, 1, 0, 0, 5, 5, 5, 5, 5, 2, 2, 0]

This terminates after 790 frames and with a score 520. I have set the random seed for everything (random, numpy, torch, python environment variable) to 12321.

Thanks a ton!

willtop commented 3 years ago

Just want to confirm that the above results are identically reproduciable as I just tried on a different computer, with different ROM (another one as listed above), using a different ALE class (atari_py package, instead of the official ALE compiled from this github repo used as the last post). So I don't think it's due to a particular defective ROM or ALE class implementation or even computation platform. However I really couldn't find anything wrong within my code either.

JesseFarebro commented 3 years ago

So I'm not able to reproduce even that score with the actions you sent. Would it be possible to get a minimal reproduction in code? If it's too hard to decouple and you feel comfortable emailing it to me you can do that as well. I was trying something along the lines of,

import ale_py
import sys

actions = [] # Sequence of actions you sent.

ale = ale_py.ALEInterface()
ale.setInt("random_seed", 12321)
ale.setInt("frame_skip", 4)
ale.loadROM(sys.argv[1])

action_set = ale.getMinimalActionSet()

episode_return = 0
for action in actions:
    episode_return += ale.act(action_set[action])

print(episode_return)

I had assumed you were using the minimal action set because your actions are from 0-5. I'm also not sure what your frameskip value is?

willtop commented 3 years ago

Hello Jesse, thanks again for your help! Sorry that I forgot to specify a lot of details here (including repeating each action 4 times, as I was trying to reproduce DQN paper results). My apology.

I have extracted a simple script that would reproduce the error. Please ignore the commented lines as I was trying different ALEs and ROMs. Essentially when running the script below, the environment should just terminate after 604 frames and with the reward of 320. Through visualizing the game play, the agent was still having two lives, and suddenly the game will terminate (without the agent being hit, targets exhaustion, or any special circumstances occurring in the game).

import random
import os
import numpy as np
import torch
#import atari_py
#from atari_py import ALEInterface
from ale_py import ALEInterface
# Set random seed
RANDOM_SEED = 1234
os.environ['PYTHONHASHSEED'] = str(RANDOM_SEED)
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

ale = ALEInterface()
ale.setInt("random_seed", RANDOM_SEED)
ale.setFloat("repeat_action_probability", 0)
ale.setBool("display_screen", True)
#ale.loadROM(atari_py.get_game_path("space_invaders"))
ale.loadROM("space_invaders.bin")
env_actions = ale.getMinimalActionSet()

actions = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 2, 2, 2, 4, 2, 2, 2, 2, 1, 1, 1, 1, 0, 1, 1, 2, 1, 1, 1, 0, 1, 5, 2, 1, 1, 5, 4, 1, 1, 4, 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 4, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 3, 5, 5, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 5, 5, 5, 1, 5, 5, 5, 1, 5, 5, 5, 1, 5, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 2, 0, 0, 0, 1, 1, 1, 1, 1, 1, 3, 5, 5, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 4, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 3, 2, 1, 1, 1, 2, 0, 2, 2, 2, 1, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 1, 2, 4, 4, 2, 2, 2, 2, 5, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 4, 4, 4, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 4, 4, 1, 2, 1, 4, 4, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 2, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 5, 2, 5, 4, 4, 4, 4, 4, 4, 3, 4, 4, 3, 5, 3, 2, 2, 3, 1, 1, 1, 3, 3, 1, 1, 1, 1, 2, 4, 4, 4, 2, 0, 4, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 1, 5, 5, 5, 1, 2, 4, 5, 2, 0, 0]
frames = 0
score = 0
# start environment with a number of no-op actions
for i in range(random.randrange(30)):
    ale.act(0)  # assume action indexed by 0 is no-op
    if ale.game_over():
        ale.reset_game()

lives = ale.lives()
print("Starting. Lives: ", lives)
for action in actions:
    for i in range(4):
        score += ale.act(env_actions[action])
    if ale.game_over():
        break
    if lives > ale.lives():
        print("Frames: {}, Score: {}, Lives: {}".format(frames, score, ale.lives()))
        lives = ale.lives()
        # one op action after one life is done
        ale.act(0)
    frames += 1

print("[Final] Frames: {}, Score: {}, Lives: {}".format(frames, score, ale.lives()))

Let me know if you can recreate this error. Thanks again for your help!

JesseFarebro commented 3 years ago

So I actually think everything looks okay. There's something weird with how we detect the Space Invaders terminal state when the invaders drop past the last row, it seems the terminal signal is triggered ~30 frames too early. This might look weird when you're watching the display with display_screen but as an alternative try this,

  1. Make a directory, frames
  2. Edit your script and add:
    ale.setString("record_screen_dir", "frames")
  3. In the game_over check you can print out the frame number, i.e.,
    if ale.game_over():
    print(f"Game over on frame {frames * 4}")
    break

    and you'll see that the terminal state was triggered on frame 2416 but in the frames directory, we have frames until 2445. You can create a visualization of the entire episode or inspect frames 2147 - 2445 to see what happens. I made a gif for you here:

willtop

Also, it's worth noting that in Space Invaders if the aliens hit the bottom row where the player's ship is then the game is over, the number of lives doesn't matter.

willtop commented 3 years ago

Hello Jesse, this makes sense a lot! I wasn't aware of the rule of game termination when the alien reaches bottom, and as you pointed out, couldn't quite see it from the SDL visualization. Guess using frames for visualization is a better strategy.

Just would like to confirm a couple of last things:

  1. When running in your computer, were you able to see the same frame number and scores as I mentioned in the previous post (i.e. with the termination signal issued 30 frames too early)? I would just like to confirm there is nothing wrong with my ALE class and ROMs.
  2. Are the people at DeepMind working with the same software setup (having this termination signal issued 30 frames earlier than it should) when they train on their DQN model? As I am trying to reproduce the result and currently am unable to do so, I am wondering if this could be one of the reason.

Sincere thanks for helping me clarifying this confusion! I truly appreciate your help!

JesseFarebro commented 3 years ago
  1. When running in your computer, were you able to see the same frame number and scores as I mentioned in the previous post (i.e. with the termination signal issued 30 frames too early)? I would just like to confirm there is nothing wrong with my ALE class and ROMs.

Yes, I got the same return: 320 in 2416 frames.

  1. Are the people at DeepMind working with the same software setup (having this termination signal issued 30 frames earlier than it should) when they train on their DQN model? As I am trying to reproduce the result and currently am unable to do so, I am wondering if this could be one of the reason.

This is a loaded question. There have been quite a few methodologies over the years w.r.t. the ALE. I wouldn't recommend trying to reproduce the DeepMind DQN results for a couple of reasons:

  1. The methodology they used is pretty outdated now. I'd suggest following the methodology outlined in Revisiting the Arcade Learning Environment by Machado et al.
  2. Even if you used the DQN Nature methodology, it's unlikely you'll be able to get similar performance to DeepMind's original paper. The results reported in their paper is the max performance overall all runs; they were trying to demonstrate human-level capabilities. There's quite a bit of variance in some games, and returns can be multimodal, so the performance reported in papers may look quite different than your results for few seeds. As long as the results are in the same ballpark, I wouldn't worry too much. The paper I listed in (1) provides results using their methodology, and I believe these would be more attainable.

If you're implementing DQN to learn, that's great. On the other hand, if you don't fall into this category, I'd recommend using Dopamine, or similar frameworks that implement recent value-based agents on Atari using the latest methodologies.

If you have any more questions, let me know.

willtop commented 3 years ago

Hello Jesse, these information is awesome! Especially for Dopamine, it looks like a very interesting repository, and I will definitely check it out.

Currently there are just many researches and works online and it's hard to grasp what's the state-of-art, and the corresponding official or organized implementations for us to look into. It's strange why deepmind never published their codes for the series of DQN works.

I am also aware of the ACME repository, but yet to understand what's the intersection of it to this Dopamine repo. As for the line of DQN works, according to my knowledge, the state-of-art is rainbow . Yet I couldn't find any official repo implementing rainbow. Now that I am looking at the Dopamine, this seems to be it.

Thanks again for this info and all the help you've been provided. I truly appreciate it!

JesseFarebro commented 3 years ago

So DeepMind actually did publish their DQN source, although it's written in Lua and uses an old ALE wrapper. You can check it out here.

State of the art is a tough topic; it really depends on what research question you're asking. Trying to get a higher number than, say Rainbow, might not be the best way to go about things. Knowing what you should benchmark your agent against is a difficult question in and of itself. If you want to add something on top of value-based deep RL methods, I think it's fair to start with DQN. When you start combining all these different methods that go into Rainbow, it can be hard to isolate specific effects empirically.

No problem! 😄

willtop commented 3 years ago

Hello Jesse, It's a bit interesting and challenging to read LUA. I will give it a go.

These advice is great. I will start from simple DQN and go from there.

Thanks a ton for your kind help! I really appreciate that!