DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.83k stars 1.68k forks source link

[Question] DRL training with vectorized environments and asynchronous time-steps #1786

Closed akmandor closed 8 months ago

akmandor commented 9 months ago

❓ Question

Background: I am trying to implement a similar architecture in this work: ReLMoGen.

Issue Description: When parallelized (=vectorized), it seems like, agents wait each other until both complete each action.

Question: As far as I understand, rather than what simulator I use, I think this issue is purely based on the stable-baseline implementation of the training process for the vectorized environments. Hence, is there a way (hoping an easy one, such as enabling some flag of a wrapper...) that I can enable asynchronous training in vectorized environments in stable-baselines3?

Checklist

araffin commented 8 months ago

Hello, I guess it is a duplicate of https://github.com/DLR-RM/stable-baselines3/issues/715 ? We have a proof of concept for SAC with async training in the zoo.

And yes, if you use SubprocVecEnv, there will be a synchronization step after each step.

Alternatively, if a step is very long anyway, you could use more sample efficient approach (that also require more compute) like DroQ, implemented in SBX: https://github.com/araffin/sbx (note SBX=SB3+Jax and only covers a subset of SB3 features).