Denys88 / rl_games

RL implementations
MIT License
848 stars 142 forks source link

EnvPool advertisement #164

Closed Trinkle23897 closed 1 year ago

Trinkle23897 commented 2 years ago

Hi, I just came across this repo. I'm quite surprised that you use envpool to achieve 2 min Pong and 20min Breakout, nice work!

I'm wondering if you'd like to open a pull request at EnvPool to link with your result (like the CleanRL ones), and if it is possible for us to include your experiment result in our incoming arXiv paper. Also, it would be great if you can make more amazing results based on EnvPool mujoco tasks (which has aligned with gym's implementation and can also get a free speedup). Thanks!

BTW, isn't it a typo? https://github.com/Denys88/rl_games/blame/master/docs/ATARI_ENVPOOL.md#L9

-* **Breakout-v3** 20 minutes training time to achieve 20+ score.
+* **Breakout-v3** 20 minutes training time to achieve 400+ score.
Denys88 commented 2 years ago

Oh yes it is a typo! Thanks. Envpool is reallyt fast! We will prepare pr till the end of the week. We already got very good results with envpool mujoco humanoid: mujoco_humanoid_rl_games We are going to post updated results today/tomorrow. Right now I am working on the google colab example which trains walker2d with envpool in a few minutes. And I need to switch to the original mujoco-py for visualization. Also It would be nice to have a render function for envpool envs. Humanoid policy doesnt work with original mujoco-py env. Other envs are good. Mujoco-py used outdated version of the mujoco 210 vs 215 in envpool.

Trinkle23897 commented 2 years ago

Btw, could you please use the newest version (0.6.1.post1) to verify the final reward on ant-v3 and humanoid-v3? Some changes have been made but I'm not sure whether it will break the consistency.

ViktorM commented 2 years ago

@Trinkle23897 the Breakout-v3 typo was fixed. And btw this Mujoco humanoid result I got by training on my laptop with 11th Gen Intel® Core™ i9-11980HK @ 2.60GHz × 16 and RTX 3080, it was not even a desktop. Training with envpool was extremely fast.

Just started training Mujoco Humanoid with the latest envpool. You updated it really fast to the new opensourced Mujoco version!

ViktorM commented 2 years ago

mujoco_perf_new @Trinkle23897 humanoid works the same well, blue is the new run.

ViktorM commented 2 years ago

image Ant also works well!

Trinkle23897 commented 2 years ago

Great! Would you like to be one of the authors of our paper?

ViktorM commented 2 years ago

I and Denys would happy to contribute to being co-authors of the paper with you.

Trinkle23897 commented 2 years ago

Let me set up the discord server so that we can talk the detail there.

Denys88 commented 2 years ago

btw, you can join our discord too https://discord.gg/hnYRq7DsQh

Trinkle23897 commented 2 years ago

Another request: I'm trying to use mujoco source code to build envpool. However, there are some small precision issues (https://github.com/deepmind/mujoco/issues/294). The corresponding wheels are in https://github.com/sail-sg/envpool/actions/runs/2381544251 Not sure if it will affect the benchmark result. If possible, could you please also run some experiments to verify?

ViktorM commented 2 years ago

@Trinkle23897 I can test ant and humanoid after finishing with the ongoing experiments. Btw do you plan to support DM_control multi-agent envs: https://github.com/deepmind/dm_control/blob/main/dm_control/locomotion/soccer/README.md ?

If yes we can run self-play experiments with rl_games and envpool as well, for the simplest env.

Benjamin-eecs commented 2 years ago

@ViktorM Yes, envpool plans to support all tasks in dm_control.locomotion, the multi-agent soccer will be supported too. It can be one of the multi-agent env that envpool supported.

ViktorM commented 2 years ago

@Benjamin-eecs thank you! Looking forward for o soccer with envpool. We already had some interesting results with the simples boxhead version 1x1. With envpool speed up we’ll be able to train 2x2 and maybe even ant version!