Closed spMohanty closed 5 years ago
I do believe this is because of xvfb as you mentioned by email, but the Dockerfile should work fine. This is actually just an issue with the run.py script which should be fairly straightforward to fix. Until we can get that patched, you should be able to change the following line:
env = ObstacleTowerEnv(environment_filename)
to
env = ObstacleTowerEnv(environment_filename, docker_training=True)
By turning on the docker_training
param, the script will run the environment executable within xvfb. It's also worth considering the OTC_EVALUATION_ENABLED
environment variable which lets you run the executable separately from the run script.
You'll want to set OTC_EVALUATION_ENABLED
when running the executable in a separate container. For example:
# start the gym interface / inference
docker run --env OTC_EVALUATION_ENABLED=true --network=host -it obstacle_tower_challenge:latest ./run.sh
# run the environment binary, port 5005
docker run --env OTC_EVALUATION_ENABLED=true,OTC_EVALUATION_SEEDS=1,2,3 --network=host -it obstacle_tower_challenge:latest ./env.sh 5005 /home/crowdai/ObstacleTower/obstacletower_linux.x86_64
For now we can only use port 5005 for this type of remote executable though we may be able to work around that. Ideally each evaluation would happen in its own pod though I understand this might not be how it's set up currently.
@harperj : That was really helpful ! Thanks !
Actually the timeout error is indeed not resolved with this. We have some of our team members looking at this, who might reach out to you.
Theres also a hypothesis about some issue with the binaries of the evaluation build. Multiple team members had issues getting the evaluation build to run.
@harperj : The xvfb issue still persists. The binary doesnt play nice at all with xvfb :
mohanty@cluster026:/mount/SDE/Unity/aicrowd_otc_unity_contractor/Obstacl-Tower-Challenge-Draft$ export OTC_EVALUATION_ENABLED=true
mohanty@cluster026:/mount/SDE/Unity/aicrowd_otc_unity_contractor/Obstacl-Tower-Challenge-Draft$ xvfb-run --auto-servernum --server-args='-screen 0 640x480x24' ./ObstacleTower/obstacletower_eval.x86_64 --port 5005
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 151 (GLX)
Minor opcode of failed request: 3 (X_GLXCreateContext)
Value in failed request: 0x0
Serial number of failed request: 85
Current serial number in output stream: 86
I have been consistently getting this error when trying to get the OTC env to run :
I have checked in the evaluation binary to the repository, and pushed it (with some other minor fixes) to this branch : https://github.com/Unity-Technologies/obstacle-tower-challenge/tree/aicrowd_debug