Closed icaro56 closed 6 years ago
Hi @icaro56,
You will have to be sure to install tensorflow-gpu
instead to tensorflow
, which is installed by default due to our requirements for the mlagents
package. Currently the implementation of PPO we use does not take great advantage of a GPU. Only if you were to be large visual observations would you typically find an advantage from using GPU.
@awjuliani ,
I create a image on docker hub for tensorflow gpu: https://hub.docker.com/r/icaro56/ml-agents_images/tags/
But this image does not work. I change tensorflow to tensorflow-gpu.
It is happen this error when I try to run the train:
root@1cdd415e4b12:/workspace/unity-volume# mlagents-learn ./trainer_config.yaml --env=Bomberman --run-id=bomberman_test --train
Traceback (most recent call last):
File "/usr/local/bin/mlagents-learn", line 7, in
If you have an NVIDIA gpu you can give this Dockerfile a try and see if it works for you. It is derived from the Unity docker image but uses nvidia/cudagl/cudnn and nvidia-docker2. It will let you train using the GPU (in a headless mode if desired).
https://github.com/mneilly/linux-unity-ml-agents-nvidia-docker
Thanks @mneilly . I will try to use this.
I am having problem of timeout. Look:
gpg: keyring `/tmp/tmp.M92x8F85ox/secring.gpg' created
gpg: keyring `/tmp/tmp.M92x8F85ox/pubring.gpg' created
gpg: requesting key AA65421D from hkp server keyserver.ubuntu.com
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
gpg: requesting key AA65421D from hkp server ha.pool.sks-keyservers.net
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
gpg: requesting key AA65421D from hkp server pgp.mit.edu
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
gpg: requesting key AA65421D from hkp server keyserver.pgp.com
gpg: keyserver timed out
gpg: keyserver receive failed: keyserver error
The command '/bin/sh -c export GNUPGHOME="$(mktemp -d)" && ( gpg --keyserver keyserver.ubuntu.com --recv-keys "$GPG_KEY" || gpg --keyserver ha. pool.sks-keyservers.net --recv-keys "$GPG_KEY" || gpg --keyserver pgp.mit.edu --recv-keys "$GPG_KEY" || gpg --keyserver keyserver.pgp.com --recv-ke ys "$GPG_KEY" ) && gpg --batch --verify python.tar.xz.asc python.tar.xz && rm -rf "$GNUPGHOME" python.tar.xz.asc && mkdir -p /usr/src/pyth on && tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz && rm python.tar.xz && cd /usr/src/python && gnuArch="$(dpkg-archit ecture --query DEB_BUILD_GNU_TYPE)" && ./configure --build="$gnuArch" --enable-loadable-sqlite-extensions --enable-shared - -with-system-expat --with-system-ffi --without-ensurepip && make -j "$(nproc)" && make install && ldconfig & & apt-get purge -y --auto-remove $buildDeps && find /usr/local -depth \( \( -type d -a \( -name test -o -name test s \) \) -o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \) -exec rm -rf '{}' + && rm -rf /usr/sr c/python' returned a non-zero code: 2
Well... if none of the key servers are responding then it looks like you are having an issue with the network and will need to try again later...
I change the address to use the port 80 and it works.
gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys "$GPG_KEY" \ || gpg --keyserver hkp://ha.pool.sks-keyservers.net:80 --recv-keys "$GPG_KEY" \ || gpg --keyserver hkp://pgp.mit.edu:80 --recv-keys "$GPG_KEY" \ || gpg --keyserver hkp://keyserver.pgp.com:80 --recv-keys "$GPG_KEY" \
I hope the image works now! :)
The image is working now. But as @awjuliani spoke, "currently the implementation of PPO we use does not take great advantage of a GPU".
Hi. Try to use the tags gpu or cpu icaro56/ml-agents_images:cpu icaro56/ml-agents_images:gpu
I am using these in my research.
@icaro56 Thanks but it does not work. It gave me:
Docker image path: index.docker.io/icaro56/ml-agents_images:latest ERROR MANIFEST_UNKNOWN: manifest unknown
Thanks for your quick reply @icaro56 . I figured it out and I deleted my message in a hurry. I'm sorry for this naive question. BTW, I'm currently working on building a docker image that includes tensorflow-gpu + nvidia-driver and X-Server in order to do the training with visual observation on a server machine. Have you ever done it before? I'm encountering some issues to build the X-Server, i.e. I'm following what it is mentioned in https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-on-Amazon-Web-Service.md. Here is the DockerFile I have so far. I still can't figure it out. Can anyone please help?
Have you ever done it before?
No, I have not.
I use only vector observations. And the version of docker with gpu, practically has the same speed of the version docker with cpu. :(
The @mneilly maybe can help you.
Ok @icaro56 Thanks.
@maystroh , I made new training with ml-agents with tensorflow-gpu and cpu, and what I'm seeing, the tensorflow-cpu are training faster than the GPU.
The machine I use has 8 gpu cards in parallel.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi,
Training ML Agents Agents on a Linux machine with docker speeds training time? If the machine has CUDA capability, are the tensorflow calculations done in GPU?