Multiple issues with docker, python versions and build script

Holt59 commented 5 years ago

I am trying to test my environment with docker on a GCloud VM.

I noticed multiple issues while trying to build and run the docker:

1. The only version working with the README tutorial is python 3.6

This is kind of annoying:

3.5 does not work due to aicrowd-repo2docker using f-strings.
3.7 does not work because ml-agents cannot be installed with 3.7

Python 3.7 can be used, but aicrowd-repo2docke must be installed without using requirements.txt.

A note should be added to the README. I have python 3.5 by default, and I compiled python 3.7 from scratch thinking it would work, just to notice it does not with ml-agents... Had I known, I would have built python 3.6.

2. Small issue with `build.sh`

This:

./build.sh

...does not work if the shell is not bash-compliant (e.g. fish). A shabang should be added, or the line should be changed bash build.sh.

3. Cannot run the docker containers if there are agents running

The docker containers cannot be launch if there are agents running aside on the same host due to the --network=host. And the worker ID cannot be changed without modifying the source code of run.py.

4. Cannot run the docker containers

Even after modifying the worker ID or trying to put the two dockers on a docker network --network=ot-network, the agent fails to launch with a unity time-out exception:

mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
         The environment does not need user interaction to launch
         The Academy and the External Brain(s) are attached to objects in the Scene
         The environment and the Python interface have compatible versions.

I am using a GCloud VM created following the tutorial. I tried running sudo /usr/bin/X :0 and adding --env DISPLAY=:0 to the docker command line but it did not work.

awjuliani commented 5 years ago

Adding @harperj who may be able to provide some context. These are good recommendations that we will take into account in the next round of the contest.

harperj commented 5 years ago

You could run aicrowd-repo2docker and your ml-agents script using different versions of Python. I believe we should be able to relax the requirement of ml-agents environments to allow Python 3.7; I'll bring that up with the team. The reason ml-agents hasn't supported Python 3.7 is that until recently Tensorflow hasn't supported Python 3.7.
Agreed.
You're intended to change the run.py script -- if you're running agents on the host as well as in a docker container you're doing something outside of what the guide is explaining (how to test out evaluation) and I'd expect that anyone would want to customize the run script in this case.
This could be a number of issues, but one thing to check is the Player.log file created by Unity. You can find it under ~/.config/. Could you share that?

Holt59 commented 5 years ago

@harperj

I've solved the problem, and I don't have the log file anymore. I think the issue had something to do with worker_id when in evaluation mode.

I'm not saying that anything should be fixed regarding this in the docker examples, but it would be great if some information could be added to the README. It took me some time to realize that when OTC_EVALUATION_ENABLED is true, the behavior is different.

In particular, environment_filename is set to None automatically when is_grading() is True but worker_id is not set to 0, which caused me some headache until I checked the actual code... (I was trying to set the worker ID, but this was throwing some strange exception).

Unity-Technologies / obstacle-tower-challenge