Open n-kish opened 1 month ago
Hello,
if you want to parallelize gradient steps, you need to have a look at https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/11 and linked issues.
if you want to parallelize data collection, you need to use VecEnv
.
I'm also not sure why you have some TF code in there...
Hi @araffin, thanks for your response.
I may have not been quite clear about what I wanted to achieve.
I am parallelizing external to the model instance and not within it. That is, with each different XML file, I run a train_ppo.py instance per thread. Depending on the XML file count, the number of parallel processes increases, and thus gym envs. are created and PPO models are trained in parallel. It is here I face the problem.
I notice that the gradient steps are somehow taking longer and longer to process with the increase in the thread count (i.e. increase in independent gym env count), which normally shouldn't be the case because each thread must be treated independently and I should have models trained independently. (Please notice the time_elapsed in seconds for just 4000 env. steps in the attached screenshot)
Hence your suggestions about parallel gradient steps and data collection through Stable-Baselines-Team/stable-baselines3-contrib/issues/11 and VecEnv, though useful, aren't addressing my problem, because I still have num_cpu=1 even in VecEnv.
Hope this clarifies things further. Please let me know how I may go about this problem, thanks.
And yes, the TF code is a blunder please ignore it.
For anyone that maybe interested in this later. The problem I faced is due to the global autograd engine of Torch (as discussed here: Assumptions around Autograd and Python multi-threading.)
I solved this by calling each different run_ppo.py file as a separate bash process instead of relying on the mulitprocessing module.
❓ Question
I am trying to parallelise execution of PPO training on MuJoCo environments, where each multiprocessing thread uses a slightly modified xml file to train PPO with. For this, I currently use:
Here the simulate_robot function fires up the same python file with args as an xml_robot from xml_robots. This python file (train_ppo.py) looks like this currently:
When I ran this code, I found that when I have more than one process, the optimizer call in stable-baselines/ppo/ppo.py takes longer and longer as the number of processes increases. I have ensured there is no cross-play in any other parts of the code, except for some interesting time delays in the code block below.
stable-baselines3/stable_baselines3/ppo/ppo.py Lines 278 to 282
Can you please help me understand what I might be dealing with here and a possible solution or alternate path to achieve the desired multiprocessing capability with SB3 ?
Checklist