Unity-Technologies / obstacle-tower-env

Obstacle Tower Environment
Apache License 2.0
540 stars 124 forks source link

Multi-proccesed envs slow down in background #84

Closed eito-fis closed 5 years ago

eito-fis commented 5 years ago

Hi, I have a quite weird problem and haven't had much success in debugging it.

I'm currently using a parallel env wrapper almost identical to that of stable-baselines. The first few synchronous rollouts on 4 environments run fine at ~ 40 steps per second, but it seems to randomly drop to ~2 steps per second. During this time, CPU usage drops to very little while GPU usage stays the same, and debugging has shown that my parallel env is simply waiting for the obstacle tower env to return a new step.

The strange part is that this slow down is immediately solved by alt-tabbing to or clicking on all the unity executables. Afterward, a few more full rollouts run before again slowing down.

Also relevant is that fact that I don't have this issue running on a debian GCP vm. Whether this is because it's debain or headless is unclear.

Before I have to sit down and write some auto alt-tabbing script to train, has anyone seen a similar problem or have some guidance on what to do? I'm on Python 3.6.3, osx v10.13.6, and Obstacle Tower Env v1.3.

awjuliani commented 5 years ago

Hi @eito-fis

Thanks for bringing this to our attention. We will take a look at it, and try to get back to you. This seems to be an macOS specific issue, and may be due to some kind of power management on that platform.

harperj commented 5 years ago

I've noticed this as well. I wonder if changing the run priority (e.g. https://superuser.com/questions/42817/is-there-any-way-to-set-the-priority-of-a-process-in-mac-os-x) will alleviate the problem. This seems to specifically be an optimization for when you can't see the window.

mengdong commented 5 years ago

I notice this when launching env on virtual GL as well. about 50% decrease on throughput over 48 environments.

eito-fis commented 5 years ago

v2.1 seems to solve this issue.

Thanks!