Closed slerman12 closed 2 years ago
Yes, so there are two different sets of actions. The full action space or legal action space comprises all 18 actions in the ALE and is recommended as of the latest methodology set out in Revisiting the Arcade Learning Environment. The alternative is the minimal action space that only returns a subset of actions that are valid in that specific environment. The API looks like,
import ale_py as ale
from ale_py.roms import Breakout
env = ale.ALEInterface()
env.loadROM(Breakout)
minimal_action_set = env.getMinimalActionSet()
full_action_set = env.getLegalActionSet()
Hope that helps.
Thank you. Yes, it does. My code for loading is:
env = f'ALE/{args.env}-v5'
env = gym.make(env)
How would I convert this env
between the two modes?
Or should I load some other way similar to your example?
Also, not too ask too many things, but are sticky actions enabled by default or is there a toggle? I'm using the Dopamine wrapper class and it doesn't seem to use sticky actions.
You would do the following,
env = gym.make(f'ALE/{args.env}-v5', full_action_space=False)
True is the default so set it to false if you want the minimal action space.
RE: sticky actions, they are enabled by default. The Dopamine wrapper doesn't apply sticky actions because it relies on the environment having this implemented (in their code you can see this when they initialize either v0
or v4
depending on whether they want sticky actions. The v5
environments have them enabled by default.
Thanks! Is there a disable toggle? It helps to compare different methods I think for more standardized benchmarking, e.g. primarily reporting the recommended settings but also providing some measure of performance for the easier, more commonly used setting (even despite differences in v5 vs. earlier variants)
Yes, you can disable it through the repeat_action_probability
kwarg to the environment. For more info see the blog post here: https://brosa.ca/blog/ale-release-v0.7/#openai-gym
env = gym.make(f'ALE/{args.env}-v5', repeat_action_probability=0.0)
A probability of 0 will completely disable sticky actions.
Ah, now I'm a bit confused. I looked at that blog post and saw that frame_skip is a parameter (default = 5 ?) but the dopamine class has its own frameskip set to 4...
As in, they ADD a frame_skip in their wrapper, in addition to any default settings
Yes, it is a little confusing. Dopamine performs further post-processing not found in the Gym environment, for example, taking the max over subsequent frames, downsampling frames, etc. To be able to do this post-processing they need to implement a frame-skip manually. If you're just using their wrapper class you should make sure your frame-skip on the Gym environment is set to 1
.
The difference between a frame-skip of 4 versus 5 is also confusing and this will actually be changing (i.e., the v5 default will be 4). You should use a frame-skip of 4, this is what's most common in literature for Deep RL algorithms.
Ah, I see that they do a max pool. Last question I think — thanks so much for clarifying everything! — what about frame stacking? As in, stacking frames rather than pooling. Dopamine only pools 2 and I think longer sequences might be of interest too
So frame pooling as we'll call it is used because of the way Atari renders frames, not all the sprites might be rendered on every frame. For example, bullets in Space Invaders are rendered every other frame.
Frame stacking is used so the agent has some notion of relative movement, e.g., which direction is the ball moving in Pong. In Dopamine, they'll return a single frame from the environment wrapper. They implement frame stacking when they sample experience from the replay buffer. So they insert a single frame into the replay buffer but construct a stack of 4 subsequent frames when sampling experience from the buffer.
Hope that answers your question.
Thank you so much!
Sorry, what is the default/recommended sticky action repeat_action_probability
?
Aand one more consideration, I assume some of these settings are disabled for eval mode? Or must the agent perform under these same settings in both modes? Sticky actions, frame skip might both hurt the agent's evaluation performance in ways that have nothing to do with the agent, thus making standardized comparisons between algorithms hard without many random seeds
Sorry, what is the default/recommended sticky action repeat_action_probability?
1/4, so repeat_action_probability=0.25
.
Aand one more consideration, I assume some of these settings are disabled for eval mode? Or must the agent perform under these same settings in both modes? Sticky actions, frame skip might both hurt the agent's evaluation performance in ways that have nothing to do with the agent, thus making standardized comparisons between algorithms hard without many random seeds
I would suggest reading the recommendations on evaluation in Revisiting the Arcade Learning Environment. This blog post is also an interesting read: https://jacobbuckman.com/2019-09-23-automation-via-reinforcement-learning/ The author talks about this exact setting of deciding between including versus excluding these sorts of parameters during evaluation (if you were to perform an explicit evaluation). You should check out this paper: https://agarwl.github.io/rliable/ which comes with a nice library that will help perform a proper empirical evaluation.
I noticed that the action dimensionality returned for Pong is 18. I'm not sure about the other environments, but I'm guessing it's because the action space is standardized across all environments? pong should really only have 2 or 3 feasible actions.
Is this the recommended Atari setup?