AI4Finance-Foundation / ElegantRL

Massively Parallel Deep Reinforcement Learning. 🔥
https://ai4finance.org
Other
3.62k stars 833 forks source link

How to crack the BipedalWalkerHardcore using eRL in colab? #24

Closed ulstaer closed 2 years ago

ulstaer commented 3 years ago

How to repeat success in video{https://www.bilibili.com/video/BV1wi4y187tC} using eRL in colab?

I tried:

args = Arguments(if_on_policy=False)
args.agent = AgentSAC()
args.env = PreprocessEnv(gym.make('BipedalWalkerHardcore-v3'))
args.reward_scale = 2 ** -1 
args.gamma = 0.95
args.rollout_num = 2
args.if_remove = False
train_and_evaluate_mp(args)

After a million step (3 hours), the agent scored 37 points (MaxR)

Yonv1943 commented 3 years ago

args.gamma = 0.99 is better than 0.95

args.agent = AgentModSAC() is better than AgentSAC() (AgentSAC can't pass 'BipedalWalkerHardcore-v3')

Ok, I will add 'BipedalWalkerHardcore-v3' to demo.py.

We have fully upgraded ElegantRL and now supports multiple GPU training (1~8 GPU).

Now the problem you mentioned has been resolved. I'm sorry that we have been busy developing the 80 GPU version (Cloud platform) of ElegantRL, and we were unable to reply to you in time.

Yonv1943 commented 3 years ago

How to repeat success in video{https://www.bilibili.com/video/BV1wi4y187tC} using eRL in colab?

I tried:

args = Arguments(if_on_policy=False)
args.agent = AgentSAC()
args.env = PreprocessEnv(gym.make('BipedalWalkerHardcore-v3'))
args.reward_scale = 2 ** -1 
args.gamma = 0.95
args.rollout_num = 2
args.if_remove = False
train_and_evaluate_mp(args)

After a million step (3 hours), the agent scored 37 points (MaxR)

I had add a demo about BipedalWalkerHardcore-v3 in elegantrl/demo.py line 53. And I had no time to do fine tuning on this env.


    if_train_bipedal_walker_hard_core = 0
    if if_train_bipedal_walker_hard_core:
        "TotalStep: 10e5, TargetReward:   0, UsedTime: 10ks ModSAC"
        "TotalStep: 25e5, TargetReward: 150, UsedTime: 20ks ModSAC"
        "TotalStep: 35e5, TargetReward: 295, UsedTime: 40ks ModSAC"
        "TotalStep: 40e5, TargetReward: 300, UsedTime: 50ks ModSAC"
        args.env = build_env(env='BipedalWalkerHardcore-v3')
        args.target_step = args.env.max_step
        args.gamma = 0.98
        args.net_dim = 2 ** 8
        args.batch_size = args.net_dim * 2
        args.learning_rate = 2 ** -15
        args.repeat_times = 1.5

        args.max_memo = 2 ** 22
        args.break_step = 2 ** 24

        args.eval_gap = 2 ** 8
        args.eval_times1 = 2 ** 2
        args.eval_times2 = 2 ** 5

        args.target_step = args.env.max_step * 1

    # train_and_evaluate(args)  # single process
    args.worker_num = 4
    args.visible_gpu = sys.argv[-1]
    train_and_evaluate_mp(args)  # multiple process
    # args.worker_num = 4
    # args.visible_gpu = '0,1'
    # train_and_evaluate_mp(args)  # multiple GPU

Here are two result of ElegantRL ModSAC for BipedalWalkerHardcore-v3.

BWHC_297

BWHC_295