Closed akmandor closed 10 months ago
Hello, I guess it is a duplicate of https://github.com/DLR-RM/stable-baselines3/issues/715 ? We have a proof of concept for SAC with async training in the zoo.
And yes, if you use SubprocVecEnv
, there will be a synchronization step after each step.
Alternatively, if a step is very long anyway, you could use more sample efficient approach (that also require more compute) like DroQ, implemented in SBX: https://github.com/araffin/sbx (note SBX=SB3+Jax and only covers a subset of SB3 features).
❓ Question
Background: I am trying to implement a similar architecture in this work: ReLMoGen.
Issue Description: When parallelized (=vectorized), it seems like, agents wait each other until both complete each action.
Question: As far as I understand, rather than what simulator I use, I think this issue is purely based on the stable-baseline implementation of the training process for the vectorized environments. Hence, is there a way (hoping an easy one, such as enabling some flag of a wrapper...) that I can enable asynchronous training in vectorized environments in stable-baselines3?
Checklist