Closed ulstaer closed 2 years ago
args.gamma = 0.99
is better than 0.95
args.agent = AgentModSAC()
is better than AgentSAC()
(AgentSAC can't pass 'BipedalWalkerHardcore-v3')
Ok, I will add 'BipedalWalkerHardcore-v3' to demo.py
.
We have fully upgraded ElegantRL and now supports multiple GPU training (1~8 GPU).
Now the problem you mentioned has been resolved. I'm sorry that we have been busy developing the 80 GPU version (Cloud platform) of ElegantRL, and we were unable to reply to you in time.
How to repeat success in video{https://www.bilibili.com/video/BV1wi4y187tC} using eRL in colab?
I tried:
args = Arguments(if_on_policy=False) args.agent = AgentSAC() args.env = PreprocessEnv(gym.make('BipedalWalkerHardcore-v3')) args.reward_scale = 2 ** -1 args.gamma = 0.95 args.rollout_num = 2 args.if_remove = False train_and_evaluate_mp(args)
After a million step (3 hours), the agent scored 37 points (MaxR)
I had add a demo about BipedalWalkerHardcore-v3
in elegantrl/demo.py line 53. And I had no time to do fine tuning on this env.
if_train_bipedal_walker_hard_core = 0
if if_train_bipedal_walker_hard_core:
"TotalStep: 10e5, TargetReward: 0, UsedTime: 10ks ModSAC"
"TotalStep: 25e5, TargetReward: 150, UsedTime: 20ks ModSAC"
"TotalStep: 35e5, TargetReward: 295, UsedTime: 40ks ModSAC"
"TotalStep: 40e5, TargetReward: 300, UsedTime: 50ks ModSAC"
args.env = build_env(env='BipedalWalkerHardcore-v3')
args.target_step = args.env.max_step
args.gamma = 0.98
args.net_dim = 2 ** 8
args.batch_size = args.net_dim * 2
args.learning_rate = 2 ** -15
args.repeat_times = 1.5
args.max_memo = 2 ** 22
args.break_step = 2 ** 24
args.eval_gap = 2 ** 8
args.eval_times1 = 2 ** 2
args.eval_times2 = 2 ** 5
args.target_step = args.env.max_step * 1
# train_and_evaluate(args) # single process
args.worker_num = 4
args.visible_gpu = sys.argv[-1]
train_and_evaluate_mp(args) # multiple process
# args.worker_num = 4
# args.visible_gpu = '0,1'
# train_and_evaluate_mp(args) # multiple GPU
Here are two result of ElegantRL ModSAC for BipedalWalkerHardcore-v3
.
How to repeat success in video{https://www.bilibili.com/video/BV1wi4y187tC} using eRL in colab?
I tried:
After a million step (3 hours), the agent scored 37 points (MaxR)