DRL training with vectorized environment and asynchronous time-steps

akmandor commented 11 months ago

Background: I am trying to implement a similar architecture in this work: ReLMoGen.

The main difference between their framework and a regular RL pipeline is instead of learning atomic actions (such as joint commands), they learn to set some parameters (such as which joint to control, target, etc.) of a motion planner.
Hence, instead of applying an action and getting a reward at each dt (lets say atomic time step), the DRL pipeline needs to wait some action horizon (n*dt) to complete one action step.
In the Appendix of the full paper, authors state that they parallelize the training 16 times.
I created a vectorized environment using the stable_baselines3_example.py and integrated with ROS, as explained here, to parallelize the training with Stable-Baselines3 while setting the action commands by subscribing a ROS topic.

Issue description: When parallelized (=vectorized), it seems like, agents wait each other until both complete each action.

In order to test, I set 1 sec sleep time for robot1 at action step and keep robot0 as default (no sleep).
I set the max step for an episode as 15.
I set dt = 0.1 sec for each action step.
Given those, I was expecting that robot0 will reset at every 1.5 sec and robot1 will reset 16.5 sec.
However, both reset after 16.5 sec and move at every 1.1 sec, which means that robot0 waits robot1 at every step.

Question: Is there any way that I can set the training pipeline, using iGibson (Pybullet) and Stable-Baselines3, to enable asynchronous time steps? I would really appreciate if you can guide me on this issue or suggesting other alternatives to implement this architecture.

ChengshuLi commented 11 months ago

@akmandor You are right that for parallelized environments, as it's implemented in sb3, it's bottlenecked by the slowest environment. It's typically not an issue if all environment steps take approximately the same amount of time (hence there is no real bottleneck). Some other RL packages (non-stable baselines 3) might provide such functionality but I am not entirely sure. Since this is not technically an issue with iGibson but rather RL framework parallelization, I will close the issue for now. Feel free to re-open if you need additional help! Thanks!

akmandor commented 10 months ago

Thank you so much for clarifying the issue for me. I might take a look to other available RL libraries, such as RAY's RL-Lib to solve that training bottleneck.

StanfordVL / iGibson

DRL training with vectorized environment and asynchronous time-steps #349