google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://offline-rl.github.io/
Apache License 2.0
528 stars 74 forks source link

Getting 7 as action for a game with 3 actions #17

Closed arjung128 closed 2 years ago

arjung128 commented 3 years ago

I have been trying to train an online agent on the environment FreewayNoFrameskip-v4. Because this gym environment is not deterministic, I seeded the environment. Specifically, in atari_lib.py, I added

No other changes were made. I then used this repo to train an online agent.

In all of training, there was one instance of a 7 stored as the action (specifically the last action in the very first action checkpoint stored in replay_logs), even though Freeway only has three actions. All other stored actions were {0, 1, 2}. Any ideas what could be the cause of this, or has anything similar been observed? Going in and changing this one 7 to the most common action isn't a problem, but if this problem arises repeatedly, and for other games, it could be difficult to deal with.

agarwl commented 3 years ago

Any ideas what could be the cause of this?

My hunch is that in each of the checkpoint files, there are stack size (typically 4) elements near the end which are never used or sampled from the buffer (see this function in Dopamine replay buffer for more info). Pinging @psc-g to confirm this hypothesis.

Or has anything similar been observed. Going in and changing this one 7 to the most common action isn't a problem, but if this problem arises repeatedly, and for other games, it could be difficult to deal with.

I haven't observed anything like this but I'll try to see if I can reproduce this behavior. Also, note that the network can't handle action 7 as input as if Freeway has only 3 actions (it would throw an error), so my best guess is these elements are in the invalid_range of Dopamine replay buffer.