Getting 7 as action for a game with 3 actions

google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games

Apache License 2.0

528 stars 74 forks source link

I have been trying to train an online agent on the environment FreewayNoFrameskip-v4. Because this gym environment is not deterministic, I seeded the environment. Specifically, in atari_lib.py, I added

env.seed(0) after env = gym.make(full_game_name) in create_atari_environment
self.environment.seed(0) at the end of the AtariPreprocessing class's __init__ function
self.environment.seed(0) at the start of the reset function in the AtariPreprocessing class

No other changes were made. I then used this repo to train an online agent.

In all of training, there was one instance of a 7 stored as the action (specifically the last action in the very first action checkpoint stored in replay_logs), even though Freeway only has three actions. All other stored actions were {0, 1, 2}. Any ideas what could be the cause of this, or has anything similar been observed? Going in and changing this one 7 to the most common action isn't a problem, but if this problem arises repeatedly, and for other games, it could be difficult to deal with.

Any ideas what could be the cause of this?

My hunch is that in each of the checkpoint files, there are stack size (typically 4) elements near the end which are never used or sampled from the buffer (see this function in Dopamine replay buffer for more info). Pinging @psc-g to confirm this hypothesis.

Or has anything similar been observed. Going in and changing this one 7 to the most common action isn't a problem, but if this problem arises repeatedly, and for other games, it could be difficult to deal with.

I haven't observed anything like this but I'll try to see if I can reproduce this behavior. Also, note that the network can't handle action 7 as input as if Freeway has only 3 actions (it would throw an error), so my best guess is these elements are in the invalid_range of Dopamine replay buffer.

google-research / batch_rl

Getting 7 as action for a game with 3 actions #17