Closed pouyajafarian closed 3 years ago
Hey.
Depends on what you want to parallelize. If you want to parallelize the algorithm updates itself, then no, there is no support for multi-gpu setup right now. Vectorized environments only parallelize environments. If you have compute-heavy environments running them in parallel might speed up gathering samples, but does not affect algorithm training itself.
Does this mean using the vectorized wrapper I should be able to run on multiple GPU's?
PPO is quite lightweight, so unless you are using a big network for the policy/value network, I would recommend you to get better cpus rather than more gpu. The bottleneck usually comes from the environment simulation not the gradient update.
Please take a look at the issue checklist next time as it appears to be a duplicate ;)
Related issues:
Hi! Is it possible to deploy different training tasks on different GPU? All I know is to set device=cpu
or cuda
. I tried to set device=cuda:1
but I found that the task is still deployed at the default cuda:0
. Thanks!
. I tried to set device=cuda:1 but I found that the task is still deployed at the default cuda:0. Thanks!
could you please open an issue with a minimal code to reproduce? (this seems to be a bug)
Can you try with device=torch.device("cuda:1")
?
as a quick fix, you can play with CUDA_VISIBLE_DEVICES
env variable to mask the other GPUs.
Hi! I have spent some time to reproduce this error but failed. Currently, device="cuda:1"
, device=torch.device("cuda:1")
, and os.environ['CUDA_VISIBLE_DEVICES'] = "0"
are all valid to switch GPU for me. If I met this error again, I will open an issue to let you know. Or maybe I overlooked some details. Thanks for your reply!
Hello,
I would like to run the PPO algorithm https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html on a Google Cloud VM distributed on multiple GPU's. Looking at the documentation I can find below text:
...creating a multiprocess vectorized wrapper for multiple environments, distributing each environment to its own process, allowing significant speed up when the environment is computationally complex.
Does this mean using the vectorized wrapper I should be able to run on multiple GPU's?
Thanks for your help!