google-research / seed_rl

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.
Apache License 2.0
798 stars 146 forks source link

num_action_repeats=1 flag correct for Atari? #76

Closed holger-m closed 1 year ago

holger-m commented 2 years ago

In common_flags.py the num_action_repeats flag is set 1, which seems to cause an issue for Atari. It looks like in atari_preprocessing.py, a constant screen is generated with this flag setting. Maybe num_action_repeats should be set to 4 instead. Might be related to issues #75 and #51.

kimbring2 commented 2 years ago

@holger-m I try to train the explore_goal_locations_small map of DMLab using default parameter of IMPALA. I can see the maximum reward reach around 200. However, it is collapsed to low reward after reaching maximum reward.

Therefore, I run the original IMPALA code of the DeepMind at same environment. That code works well like a below log.

explore_goal_locations_small_episode_return

As like you mentioned, the num_action_repeats parameter of original code is 4. Furthermore, I find the reward clipping parameter also is different from original.

I am training after adjusting the reward clipping parameter as 1.0 at Seed RL and going to notify if it works.

kimbring2 commented 2 years ago

@holger-m I find actor.py has some problem for Atari, DmLab environment. It only works well with Gfootball environment now.

I can train Pong-v0 using custom actor.py file like a below reward.

reward_sum: -8.0 reward_sum: -9.0 reward_sum: -11.0 reward_sum: -13.0 reward_sum: -7.0 reward_sum: -11.0 reward_sum: -3.0 reward_sum: -1.0 reward_sum: -1.0 reward_sum: -2.0 reward_sum: -11.0 reward_sum: -6.0 reward_sum: -6.0 reward_sum: -1.0 reward_sum: -6.0 reward_sum: -4.0 reward_sum: -4.0 reward_sum: -2.0 reward_sum: -8.0 reward_sum: -11.0 reward_sum: -9.0 reward_sum: -6.0 reward_sum: -10.0 reward_sum: 1.0

You can check that code on my Github.