I'm training the RecurrentPPO model on custom environment. To speed up the training I used SubprocVecEnv with num_envs = 32. I noticed that it completely changed the performance of the training in comparison to DummyVecEnv with num_envs = 1 (in the case of 32 envs I've decreased the n_steps 32 times, so the batch size remained the same). Below is the plot with two training runs - pink is DummyVecEnv (1) and green is SubprocVecEnv (32).
Do you know how to explain that huge change?
Thank you in advance,
Code example
No response
Relevant log output / Error message
No response
System Info
No response
Checklist
[X] I have checked that there is no similar issue in the repo
🐛 Bug
I'm training the RecurrentPPO model on custom environment. To speed up the training I used SubprocVecEnv with num_envs = 32. I noticed that it completely changed the performance of the training in comparison to DummyVecEnv with num_envs = 1 (in the case of 32 envs I've decreased the n_steps 32 times, so the batch size remained the same). Below is the plot with two training runs - pink is DummyVecEnv (1) and green is SubprocVecEnv (32). Do you know how to explain that huge change?
Thank you in advance,
Code example
No response
Relevant log output / Error message
No response
System Info
No response
Checklist