alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
773 stars 107 forks source link

Envpool mujoco tuning #216

Closed alex-petrenko closed 1 year ago

alex-petrenko commented 1 year ago

@edbeeching I updated some parameters to be more inline with https://github.com/Denys88/rl_games/blob/master/docs/MUJOCO_ENVPOOL.md and I think the performance is substantially better (both sample efficiency and throughput). Please let me know what you think!

Both rl_games and our config use only 64 agents which is really quite a small number (in IGE we use 4K-8K). I tried playing with params a bit more (more agents, larger batch, shorter rollout), and I can easily get same wall-time performance (so better FPS but worse sample efficiency), but with configs I tried I could not get better wall-time reward than this baseline config.

I also noticed that one some seeds the result is a bit worse than rl_games (on Ant I got ~5000 reward at 5M and rl_games is ~6000 reward). This can be addressed later. Should be a good task for @andrewzhang505 to figure this out in the future.

codecov-commenter commented 1 year ago

Codecov Report

Base: 79.86% // Head: 80.10% // Increases project coverage by +0.24% :tada:

Coverage data is based on head (4fa05ff) compared to base (0b1d3fc). Patch coverage: 83.33% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## sf2 #216 +/- ## ========================================== + Coverage 79.86% 80.10% +0.24% ========================================== Files 91 92 +1 Lines 7399 7404 +5 ========================================== + Hits 5909 5931 +22 + Misses 1490 1473 -17 ``` | [Impacted Files](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko) | Coverage Δ | | |---|---|---| | [sample\_factory/utils/algo\_version.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvdXRpbHMvYWxnb192ZXJzaW9uLnB5) | `0.00% <0.00%> (ø)` | | | [sample\_factory/utils/wandb\_utils.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvdXRpbHMvd2FuZGJfdXRpbHMucHk=) | `26.92% <ø> (ø)` | | | [sf\_examples/envpool/mujoco/envpool\_mujoco\_utils.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2ZfZXhhbXBsZXMvZW52cG9vbC9tdWpvY28vZW52cG9vbF9tdWpvY29fdXRpbHMucHk=) | `79.16% <0.00%> (-4.17%)` | :arrow_down: | | [sf\_examples/mujoco/mujoco\_params.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2ZfZXhhbXBsZXMvbXVqb2NvL211am9jb19wYXJhbXMucHk=) | `100.00% <ø> (ø)` | | | [sample\_factory/algo/runners/runner.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvYWxnby9ydW5uZXJzL3J1bm5lci5weQ==) | `87.95% <100.00%> (+0.04%)` | :arrow_up: | | [...f\_examples/envpool/mujoco/envpool\_mujoco\_params.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2ZfZXhhbXBsZXMvZW52cG9vbC9tdWpvY28vZW52cG9vbF9tdWpvY29fcGFyYW1zLnB5) | `100.00% <100.00%> (ø)` | | | [sf\_examples/envpool/mujoco/train\_envpool\_mujoco.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2ZfZXhhbXBsZXMvZW52cG9vbC9tdWpvY28vdHJhaW5fZW52cG9vbF9tdWpvY28ucHk=) | `79.16% <100.00%> (+0.90%)` | :arrow_up: | | [sample\_factory/algo/learning/learner.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/216/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvYWxnby9sZWFybmluZy9sZWFybmVyLnB5) | `87.85% <0.00%> (+3.03%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.