Unity-Technologies / obstacle-tower-env

Obstacle Tower Environment
Apache License 2.0
542 stars 125 forks source link

Cannot launch more than 65 environments #70

Open Holt59 opened 5 years ago

Holt59 commented 5 years ago

I tried to launch 100 environments I got a UnityTimeoutException when creating the 66th one. I checked multiple times and the exception always occurs on the 66th instantiation.

I am using gcloud with a K80 GPU and the memory usage is less than the available memory.

awjuliani commented 5 years ago

Hi @Holt59

One possibility is that for some reason the port is taken for that environment. We start with port 5005 (worker_id=0) and increment from there. I would suggest trying different worker ids.

If that doesn't seem to be the issue, another thing to try would be to add a wait time between launching the environment. We've gotten reports that when launching too many Unity processes concurrently errors like this can occur.

Sohojoe commented 5 years ago

@Holt59 you may be running out of GPU memory. I've only been able to run 2x16 locally (16 per gpu one is 1080 with 8gb other is a 1060 with 6gb). In the large scale curiosity paper they stated they where only able to get 40 unity environments running (I can't remember if it was a 4 or 8 gpu)

Also, I use a sec delay between launching each unity instance

Holt59 commented 5 years ago

@awjuliani I've already checked the port, I'll try to add a delay between launch.

@Sohojoe I've a 12G K80 and I am only starting environment, no extra algorithms. And as I said, the GPU memory consumption (nvidia-smi) is nowhere near the its limit. I'll check the delay between the launch.

Sohojoe commented 5 years ago

@Holt59 - did you get around this? I found that some ports are in use on my PC and so did a hardcoded hack to skip them

Holt59 commented 5 years ago

@Sohojoe — I did not solve this but I did not look that much into it because I faced other ones... I checked the ports on my computer, and I had nothing running on these, so I don't think that was the issue.