ctallec / world-models

Reimplementation of World-Models (Ha and Schmidhuber 2018) in pytorch
MIT License
564 stars 129 forks source link

Multiprocessing very slow #34

Open TrentBrick opened 4 years ago

TrentBrick commented 4 years ago

I am running controllertrain.pyon a google cloud VM headlessly with python 3.7 and xvfb. Everything works but I have noticed what seems to be a linear relationship between the number of workers I allow and the time for each worker to execute its rollout.

If only one worker is allowed it can run 200 steps of the environment in 5 seconds. For 10 workers each worker is only able to get 10 steps, this means that the 10 workers are actually 50% slower at getting through the iterations (each worker is outputting the iteration it is on in its rollout (added a print statement inside misc.utils.py for this))!

Has anyone else observed a similar effect? What could be wrong with my server? I am not using any GPUs, just CPU to run the VAE and MDRNN.

Thank you.

TrentBrick commented 4 years ago

I have started using the Ray multiprocessing library which so far seems to be working very well.

longfeizhang617 commented 3 years ago

Hi,have you ever met a issue like this, when I try to run the controllertrain.py on my local computer,I found that the 32g RAM was quickly out of memory,even though I have set the work_number as 1.I don't know why,have you ever met this issue?