Hi, can you share a configuration that can reproduce the results you showed on the figure?
I run the default M1 configuration and only get average episodic reward at around 3.
I tried to change the configurations like setting action_repeat = 4, change learning_rate, add double_q and duel_q, there is no much change.
Hi, can you share a configuration that can reproduce the results you showed on the figure? I run the default M1 configuration and only get average episodic reward at around 3.
I tried to change the configurations like setting action_repeat = 4, change learning_rate, add double_q and duel_q, there is no much change.
Many thanks!