Unity-Technologies / obstacle-tower-challenge

Starter Kit for the Unity Obstacle Tower challenge
Apache License 2.0
119 stars 39 forks source link

Error : The Unity environment took too long to respond #6

Closed spMohanty closed 5 years ago

spMohanty commented 5 years ago

I have been consistently getting this error when trying to get the OTC env to run :

I have checked in the evaluation binary to the repository, and pushed it (with some other minor fixes) to this branch : https://github.com/Unity-Technologies/obstacle-tower-challenge/tree/aicrowd_debug

mohanty@aicrowd-node-083:~$ docker exec -it obstacle_tower ./run.sh
root
Found path: /home/crowdai/./ObstacleTower/obstacletower.x86_64
Mono path[0] = '/home/crowdai/./ObstacleTower/obstacletower_Data/Managed'
Mono config path = '/home/crowdai/./ObstacleTower/obstacletower_Data/MonoBleedingEdge/etc'
Preloaded 'ScreenSelector.so'
Preloaded 'libgrpc_csharp_ext.x64.so'
PlayerPrefs - Creating folder: /home/crowdai/.config/unity3d/Unity Technologies
PlayerPrefs - Creating folder: /home/crowdai/.config/unity3d/Unity Technologies/ObstacleTower
Logging to /home/crowdai/.config/unity3d/Unity Technologies/ObstacleTower/Player.log
Traceback (most recent call last):
  File "run.py", line 19, in <module>
    env = ObstacleTowerEnv(environment_filename)
  File "/srv/venv/lib/python3.6/site-packages/obstacle_tower_env.py", line 38, in __init__
    self._env = UnityEnvironment(environment_filename, worker_id, docker_training=docker_training)
  File "/srv/venv/lib/python3.6/site-packages/mlagents/envs/environment.py", line 67, in __init__
    aca_params = self.send_academy_parameters(rl_init_parameters_in)
  File "/srv/venv/lib/python3.6/site-packages/mlagents/envs/environment.py", line 493, in send_academy_parameters
    return self.communicator.initialize(inputs).rl_initialization_output
  File "/srv/venv/lib/python3.6/site-packages/mlagents/envs/rpc_communicator.py", line 77, in initialize
    "The Unity environment took too long to respond. Make sure that :\n"
mlagents.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
     The environment does not need user interaction to launch
     The Academy and the External Brain(s) are attached to objects in the Scene
     The environment and the Python interface have compatible versions.
harperj commented 5 years ago

I do believe this is because of xvfb as you mentioned by email, but the Dockerfile should work fine. This is actually just an issue with the run.py script which should be fairly straightforward to fix. Until we can get that patched, you should be able to change the following line:

env = ObstacleTowerEnv(environment_filename)

to

env = ObstacleTowerEnv(environment_filename, docker_training=True)

By turning on the docker_training param, the script will run the environment executable within xvfb. It's also worth considering the OTC_EVALUATION_ENABLED environment variable which lets you run the executable separately from the run script.

You'll want to set OTC_EVALUATION_ENABLED when running the executable in a separate container. For example:

# start the gym interface / inference
docker run --env OTC_EVALUATION_ENABLED=true --network=host -it obstacle_tower_challenge:latest ./run.sh
# run the environment binary, port 5005
docker run --env OTC_EVALUATION_ENABLED=true,OTC_EVALUATION_SEEDS=1,2,3 --network=host -it obstacle_tower_challenge:latest ./env.sh 5005 /home/crowdai/ObstacleTower/obstacletower_linux.x86_64

For now we can only use port 5005 for this type of remote executable though we may be able to work around that. Ideally each evaluation would happen in its own pod though I understand this might not be how it's set up currently.

spMohanty commented 5 years ago

@harperj : That was really helpful ! Thanks !

spMohanty commented 5 years ago

Actually the timeout error is indeed not resolved with this. We have some of our team members looking at this, who might reach out to you.

Theres also a hypothesis about some issue with the binaries of the evaluation build. Multiple team members had issues getting the evaluation build to run.

spMohanty commented 5 years ago

@harperj : The xvfb issue still persists. The binary doesnt play nice at all with xvfb :

mohanty@cluster026:/mount/SDE/Unity/aicrowd_otc_unity_contractor/Obstacl-Tower-Challenge-Draft$ export OTC_EVALUATION_ENABLED=true
mohanty@cluster026:/mount/SDE/Unity/aicrowd_otc_unity_contractor/Obstacl-Tower-Challenge-Draft$ xvfb-run --auto-servernum --server-args='-screen 0 640x480x24' ./ObstacleTower/obstacletower_eval.x86_64 --port 5005
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  85
  Current serial number in output stream:  86