RLE-Foundation / rllte

Long-Term Evolution Project of Reinforcement Learning
https://docs.rllte.dev/
MIT License
467 stars 86 forks source link

[Progress Report] Construction of RLLTE Data Hub #30

Open yuanmingqi opened 1 year ago

yuanmingqi commented 1 year ago

Due to the high computing power required for training, we will gradually upload data to the data hub and report the progress in this issue. We will also change the priority of training according to needs, and you can leave a message here.

yuanmingqi commented 1 year ago

Environment: DMControl Completed:

  1. Soft Actor-Critic (SAC) 27 tasks reported in pytorch_sac. Two examples:
    • sac_dmc_state_humanoid_run (2 seeds, 10M steps)
    • sac_dmc_state_quadruped_walk (10 seeds, 2M steps)

Model import example:

from rllte.hub.models import DMControl

if __name__ == "__main__":
    model = DMControl().load_models(
        agent="sac",
        env_id="humanoid_run",
        seed=1,
        device="cuda"
    )
    print(model)
yuanmingqi commented 1 year ago

Environment: Envpool Atari Games synchronous mode Completed:

  1. Proximal Policy Optimization (PPO) 57 Atari games reported in Agent57: Outperforming the Atari Human Benchmark. Two examples:
    • ppo_atari_Breakout-v5 (10 seeds, 10M steps)
    • ppo_atari_Pong_v5 (10 seeds, 10M steps)

Model import example:

from rllte.hub.models import Atari

if __name__ == "__main__":
    model = Atari().load_models(
        agent="ppo",
        env_id="Pong-v5",
        seed=1,
        device="cuda"
    )
    print(model)
yuanmingqi commented 1 year ago

Environment: Envpool Procgen Games synchronous mode Completed:

  1. Proximal Policy Optimization (PPO)
    • ppo_procgen_bigfish (10 seeds, 25M steps)
    • ppo_procgen_bossfight (10 seeds, 25M steps)
    • ppo_procgen_caveflyer (10 seeds, 25M steps)
    • ppo_procgen_chaser (10 seeds, 25M steps)
    • ppo_procgen_climber (10 seeds, 25M steps)
    • ppo_procgen_coinrun (10 seeds, 25M steps)
    • ppo_procgen_dodgeball (10 seeds, 25M steps)
    • ppo_procgen_fruitbot (10 seeds, 25M steps)
    • ppo_procgen_heist (10 seeds, 25M steps)
    • ppo_procgen_jumper (10 seeds, 25M steps)
    • ppo_procgen_leaper (10 seeds, 25M steps)
    • ppo_procgen_maze (10 seeds, 25M steps)
    • ppo_procgen_miner (10 seeds, 25M steps)
    • ppo_procgen_ninja (10 seeds, 25M steps)
    • ppo_procgen_plunder (10 seeds, 25M steps)
    • ppo_procgen_starpilot (10 seeds, 25M steps)

Model import examle:

from rllte.hub.models import Procgen

if __name__ == "__main__":
    model = Procgen().load_models(
        agent="ppo",
        env_id="bigfish",
        seed=1,
        device="cuda"
    )
    print(model)