Open TrentBrick opened 4 years ago
I have started using the Ray multiprocessing library which so far seems to be working very well.
Hi,have you ever met a issue like this, when I try to run the controllertrain.py on my local computer,I found that the 32g RAM was quickly out of memory,even though I have set the work_number as 1.I don't know why,have you ever met this issue?
I am running
controllertrain.py
on a google cloud VM headlessly with python 3.7 and xvfb. Everything works but I have noticed what seems to be a linear relationship between the number of workers I allow and the time for each worker to execute its rollout.If only one worker is allowed it can run 200 steps of the environment in 5 seconds. For 10 workers each worker is only able to get 10 steps, this means that the 10 workers are actually 50% slower at getting through the iterations (each worker is outputting the iteration it is on in its rollout (added a print statement inside
misc.utils.py
for this))!Has anyone else observed a similar effect? What could be wrong with my server? I am not using any GPUs, just CPU to run the VAE and MDRNN.
Thank you.