Unity-Technologies / obstacle-tower-env

Obstacle Tower Environment
Apache License 2.0
542 stars 125 forks source link

Issue with environment inside docker #72

Closed Holt59 closed 5 years ago

Holt59 commented 5 years ago

I am trying to run the docker built from the challenge and I have some troubles (reported here).

While trying to work around some issues, I found a very strange "bug" within the environment. I start a bash within a docker container:

docker run \
    --env OTC_EVALUATION_ENABLED=true \
    --env DISPLAY=:0 \
    --network=host \
    -it obstacle_tower_challenge:latest bash

I then start a python interpret and run the following:

e = ObstacleTowerEnv('./ObstacleTowerEnv/obstacletower', 
                     docker_training=True, worker_id=50)

(I have to use worker_id=50 because I have workers running on the host itself.)

But I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/conda/lib/python3.6/site-packages/obstacle_tower_env.py", line 45, in __init__
    timeout_wait=timeout_wait)
  File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py", line 57, in __init__
    "If the environment name is None, "
mlagents_envs.exception.UnityEnvironmentException: If the environment name is None, the worker-id must be 0 in order to connect with the Editor.

...but the environment name is clearly not None. Furthermore, if I retry the same instruction, I get another error:

Traceback (most recent call last):
  File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py", line 71, in check_port
    s.bind(("localhost", port))
OSError: [Errno 98] Address already in use

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/conda/lib/python3.6/site-packages/obstacle_tower_env.py", line 45, in __init__
    timeout_wait=timeout_wait)
  File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py", line 50, in __init__
    self.communicator = RpcCommunicator(worker_id, base_port, timeout_wait)
  File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py", line 44, in __init__
    self.create_server()
  File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py", line 50, in create_server
    self.check_port(self.port)
  File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py", line 73, in check_port
    raise UnityWorkerInUseException(self.worker_id)
mlagents_envs.exception.UnityWorkerInUseException: Couldn't start socket communication because worker number 50 is still in use. You may need to manually close a previously opened environment or use a different worker number.

...which would mean that the worker was indeed started?

And if I switch the ID to 51, I get the first error, then the second, and so on...

This seems to only happen within the docker created following the instruction on the README on the other directory, but since this is environment-related, I am opening the issue on this repository.

Holt59 commented 5 years ago

Closing since this is related to the grading environment... Would be great to force worker_id to 0 when is_grading() is True to have a more understanding message. I'll send a PR if I can.