[Question] DRL training with vectorized environments and asynchronous time-steps

❓ Question

Background: I am trying to implement a similar architecture in this work: ReLMoGen.

The main difference between their framework and a regular RL pipeline is instead of learning atomic actions (such as joint commands), they learn to set some parameters (such as which joint to control, target, etc.) of a motion planner.
Hence, instead of applying an action and getting a reward at each dt (lets say atomic time step), the DRL pipeline needs to wait some action horizon (n*dt) to complete one action step.
In the Appendix of the full paper, authors state that they parallelize the training 16 times.
I created a vectorized environment using the stable_baselines3_example.py and integrated with ROS, as explained here, to parallelize the training with Stable-Baselines3 while setting the action commands by subscribing a ROS topic.
I am using PPO to train my custom network.

Issue Description: When parallelized (=vectorized), it seems like, agents wait each other until both complete each action.

In order to test, I set 1 sec sleep time for robot1 at action step and keep robot0 as default (no sleep).
I set the max step for an episode as 15.
I set dt = 0.1 sec for each action step.
Given those, I was expecting that robot0 will reset at every 1.5 sec and robot1 will reset 16.5 sec.
However, both reset after 16.5 sec and move at every 1.1 sec, which means that robot0 waits robot1 at every step.

Question: As far as I understand, rather than what simulator I use, I think this issue is purely based on the stable-baseline implementation of the training process for the vectorized environments. Hence, is there a way (hoping an easy one, such as enabling some flag of a wrapper...) that I can enable asynchronous training in vectorized environments in stable-baselines3?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

[Question] DRL training with vectorized environments and asynchronous time-steps #1786

❓ Question

Checklist