Closed holger-m closed 1 year ago
@holger-m I try to train the explore_goal_locations_small map of DMLab using default parameter of IMPALA. I can see the maximum reward reach around 200. However, it is collapsed to low reward after reaching maximum reward.
Therefore, I run the original IMPALA code of the DeepMind at same environment. That code works well like a below log.
As like you mentioned, the num_action_repeats parameter of original code is 4. Furthermore, I find the reward clipping parameter also is different from original.
I am training after adjusting the reward clipping parameter as 1.0 at Seed RL and going to notify if it works.
@holger-m I find actor.py has some problem for Atari, DmLab environment. It only works well with Gfootball environment now.
I can train Pong-v0 using custom actor.py file like a below reward.
reward_sum: -8.0 reward_sum: -9.0 reward_sum: -11.0 reward_sum: -13.0 reward_sum: -7.0 reward_sum: -11.0 reward_sum: -3.0 reward_sum: -1.0 reward_sum: -1.0 reward_sum: -2.0 reward_sum: -11.0 reward_sum: -6.0 reward_sum: -6.0 reward_sum: -1.0 reward_sum: -6.0 reward_sum: -4.0 reward_sum: -4.0 reward_sum: -2.0 reward_sum: -8.0 reward_sum: -11.0 reward_sum: -9.0 reward_sum: -6.0 reward_sum: -10.0 reward_sum: 1.0
You can check that code on my Github.
In common_flags.py the num_action_repeats flag is set 1, which seems to cause an issue for Atari. It looks like in atari_preprocessing.py, a constant screen is generated with this flag setting. Maybe num_action_repeats should be set to 4 instead. Might be related to issues #75 and #51.