Farama-Foundation / Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.
GNU General Public License v2.0
2.14k stars 420 forks source link

Action space of 18 for Pong? #441

Closed slerman12 closed 2 years ago

slerman12 commented 2 years ago

I noticed that the action dimensionality returned for Pong is 18. I'm not sure about the other environments, but I'm guessing it's because the action space is standardized across all environments? pong should really only have 2 or 3 feasible actions.

Is this the recommended Atari setup?

JesseFarebro commented 2 years ago

Yes, so there are two different sets of actions. The full action space or legal action space comprises all 18 actions in the ALE and is recommended as of the latest methodology set out in Revisiting the Arcade Learning Environment. The alternative is the minimal action space that only returns a subset of actions that are valid in that specific environment. The API looks like,

import ale_py as ale
from ale_py.roms import Breakout

env = ale.ALEInterface()
env.loadROM(Breakout)

minimal_action_set = env.getMinimalActionSet()
full_action_set = env.getLegalActionSet()

Hope that helps.

slerman12 commented 2 years ago

Thank you. Yes, it does. My code for loading is:

env = f'ALE/{args.env}-v5'
env = gym.make(env)

How would I convert this env between the two modes?

Or should I load some other way similar to your example?

slerman12 commented 2 years ago

Also, not too ask too many things, but are sticky actions enabled by default or is there a toggle? I'm using the Dopamine wrapper class and it doesn't seem to use sticky actions.

JesseFarebro commented 2 years ago

You would do the following,

env = gym.make(f'ALE/{args.env}-v5', full_action_space=False)

True is the default so set it to false if you want the minimal action space.

RE: sticky actions, they are enabled by default. The Dopamine wrapper doesn't apply sticky actions because it relies on the environment having this implemented (in their code you can see this when they initialize either v0 or v4 depending on whether they want sticky actions. The v5 environments have them enabled by default.

slerman12 commented 2 years ago

Thanks! Is there a disable toggle? It helps to compare different methods I think for more standardized benchmarking, e.g. primarily reporting the recommended settings but also providing some measure of performance for the easier, more commonly used setting (even despite differences in v5 vs. earlier variants)

JesseFarebro commented 2 years ago

Yes, you can disable it through the repeat_action_probability kwarg to the environment. For more info see the blog post here: https://brosa.ca/blog/ale-release-v0.7/#openai-gym

env = gym.make(f'ALE/{args.env}-v5', repeat_action_probability=0.0)

A probability of 0 will completely disable sticky actions.

slerman12 commented 2 years ago

Ah, now I'm a bit confused. I looked at that blog post and saw that frame_skip is a parameter (default = 5 ?) but the dopamine class has its own frameskip set to 4...

slerman12 commented 2 years ago

As in, they ADD a frame_skip in their wrapper, in addition to any default settings

JesseFarebro commented 2 years ago

Yes, it is a little confusing. Dopamine performs further post-processing not found in the Gym environment, for example, taking the max over subsequent frames, downsampling frames, etc. To be able to do this post-processing they need to implement a frame-skip manually. If you're just using their wrapper class you should make sure your frame-skip on the Gym environment is set to 1.

The difference between a frame-skip of 4 versus 5 is also confusing and this will actually be changing (i.e., the v5 default will be 4). You should use a frame-skip of 4, this is what's most common in literature for Deep RL algorithms.

slerman12 commented 2 years ago

Ah, I see that they do a max pool. Last question I think — thanks so much for clarifying everything! — what about frame stacking? As in, stacking frames rather than pooling. Dopamine only pools 2 and I think longer sequences might be of interest too

JesseFarebro commented 2 years ago

So frame pooling as we'll call it is used because of the way Atari renders frames, not all the sprites might be rendered on every frame. For example, bullets in Space Invaders are rendered every other frame.

Frame stacking is used so the agent has some notion of relative movement, e.g., which direction is the ball moving in Pong. In Dopamine, they'll return a single frame from the environment wrapper. They implement frame stacking when they sample experience from the replay buffer. So they insert a single frame into the replay buffer but construct a stack of 4 subsequent frames when sampling experience from the buffer.

Hope that answers your question.

slerman12 commented 2 years ago

Thank you so much!

slerman12 commented 2 years ago

Sorry, what is the default/recommended sticky action repeat_action_probability?

slerman12 commented 2 years ago

Aand one more consideration, I assume some of these settings are disabled for eval mode? Or must the agent perform under these same settings in both modes? Sticky actions, frame skip might both hurt the agent's evaluation performance in ways that have nothing to do with the agent, thus making standardized comparisons between algorithms hard without many random seeds

JesseFarebro commented 2 years ago

Sorry, what is the default/recommended sticky action repeat_action_probability?

1/4, so repeat_action_probability=0.25.

Aand one more consideration, I assume some of these settings are disabled for eval mode? Or must the agent perform under these same settings in both modes? Sticky actions, frame skip might both hurt the agent's evaluation performance in ways that have nothing to do with the agent, thus making standardized comparisons between algorithms hard without many random seeds

I would suggest reading the recommendations on evaluation in Revisiting the Arcade Learning Environment. This blog post is also an interesting read: https://jacobbuckman.com/2019-09-23-automation-via-reinforcement-learning/ The author talks about this exact setting of deciding between including versus excluding these sorts of parameters during evaluation (if you were to perform an explicit evaluation). You should check out this paper: https://agarwl.github.io/rliable/ which comes with a nice library that will help perform a proper empirical evaluation.