StanfordVL / iGibson

A Simulation Environment to train Robots in Large Realistic Interactive Scenes
http://svl.stanford.edu/igibson
MIT License
653 stars 158 forks source link

Proper way to parallelize envs #77

Closed sparisi closed 3 years ago

sparisi commented 3 years ago

My code does something like:

from torch import multiprocessing as mp
...
ctx = mp.get_context('fork')
...
env = iGibsonEnv(...) # To get observation and action spaces
...
for i in range(n):
    actor = ctx.Process(
        target=act,
        args=(...))
    actor.start()

Within act() I try to create a new environment, but it fails as reported by https://github.com/StanfordVL/iGibson/issues/71#issue-839396005 Unlike what the user reported, putting the process to sleep to few seconds did not fix the issue.

The only way to run my code is to create new environments outside act() and pass them as arguments:

for i in range(n):
    actor = ctx.Process(
        target=act,
        args=(iGibsonEnv(...), ...))
    actor.start()

However, the code is extremely slow.

I tried using ParallelNavEnv, but it creates envs in other processes which is not what I want.

What is the proper way to parallelize envs in this case? Thanks!

mjlbach commented 3 years ago

There are several examples now parallelizing with python multiprocessing (stable-baselines3) and ray: https://github.com/StanfordVL/iGibson/blob/ig-develop/igibson/examples/demo/stable_baselines3_example.py https://github.com/StanfordVL/iGibson/blob/ig-develop/igibson/envs/igibson_rllib_env.py

mjlbach commented 3 years ago

Closing because I believe this is addressed with the above examples

sycz00 commented 2 years ago

@mjlbach Hey, is the stable_baselines3_example.py supposed to work after 1M steps ? Iam currently trying to create some unit tests to verify that every libary works. Such an example would be extremely helpful :) Thanks in advance

mjlbach commented 2 years ago

Are you having an issue with it past 1 million steps?

sycz00 commented 2 years ago

No, I haven't finished training yet. I wanted to know if that is an serious example using SB3. Since training 1M steps needs some time on my resources.

mjlbach commented 2 years ago

stable_baselines3_example.py should converge, stable_baselines3_behavior_example.py will not

sycz00 commented 2 years ago

alright, thank you :) I am gonna try it out tonight and let you know if it works as expected :) Edit: do you have any graphs for the example ? not sure if this is correct. Screenshot from 2021-11-26 10-05-02 (1)

Based on 50 evaluation episodes, the agent achieves ~ 50 % success rate. Is this the expected rate more or less ? Still thanks for your help !