Closed patterntrade closed 5 years ago
Hello, Did you try running experiments with the shell script?
./run_docker_gpu.sh python train.py --algo ppo2 --env CartPole-v1
I am not 100% that the gpu image works (i have to fix a bug where tf is installed without gpu support), however the cpu image works , it used for continuous integration.
Edit: for the files, that is normal (cf stable baselines doc where the command is explained)
The GPU image doesn`t work, error msg like:
...
line 35, in
Resolved by in the container: source venv/bin/activate pip install keras pip install --upgrade tensorflow-gpu
Now it works!
Thanks for setting up this repository and the docker images, very helpful.
Merry Christmas!
:-)
Ok, I'll try to update the image then.
Hello again, I updated the docker image, it should be fixed now, can you confirm this?
Hi!
Thanks for writing.
Looking at GitHub, neither the docker file nor the docker build file have been changed. Still tried…
docker@sddub:~/Downloads$ docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/root/code/stable-baselines,type=bind araffin/stable-baselines bash -c 'cd /root/code/stable-baselines/ && pytest tests/'
================================================ test session starts ================================================= platform linux -- Python 3.5.2, pytest-3.5.1, py-1.7.0, pluggy-0.6.0 rootdir: /root/code/stable-baselines, inifile: plugins: cov-2.6.0
============================================ no tests ran in 0.00 seconds ============================================ ERROR: file not found: tests/
when going by terminal into image, cd
root@cedb2fb4ba37:/# ls bin dev home lib64 mnt proc run srv tmp var boot etc lib media opt root sbin sys usr root@cedb2fb4ba37:/# cd root root@cedb2fb4ba37:~# ls code venv root@cedb2fb4ba37:~# cd code root@cedb2fb4ba37:~/code# ls =0.10.9
I’m not sure if I’ve understood this right. Forgive me as I’m a novice to this. Was stable baselines supposed to be on board the docker container? It isn’t there. Was it supposed to be mapped/mounted to a stable baselines implementation on the host machine?
I looked thru the build file, there’s no mention of git stable baselines or similar there, only other dependencies.
Looking forward to hearing from you.
Kind regards
On 18 January 2019 at 00:55:47, Antonin RAFFIN (notifications@github.com) wrote:
Hello again, I updated the docker image, it should be fixed now, can you confirm this?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Are you using this dockerfile: https://github.com/araffin/rl-baselines-zoo/blob/master/docker/Dockerfile.gpu ?
Stable-Baselines is installed here
The built image: https://hub.docker.com/r/araffin/rl-baselines-zoo
EDIT: Oh, I see, since the beginning you seems to be using stable-baselines docker image instead of the rl zoo docker image.
Hi! Thanks for replying so quickly.
Yes, erroneously, I was using stable-baselines. I’ll get the RL-zoo image and try it out.
Still, it means that the documentation of stable-baselines needs to be updated, or the Dockerfiles/images need to be changed.¨
Kind regards
On 23 January 2019 at 20:34:23, Antonin RAFFIN (notifications@github.com) wrote:
Are you using this dockerfile: https://github.com/araffin/rl-baselines-zoo/blob/master/docker/Dockerfile.gpu ?
Stable-Baselines is installed here
The built image: https://hub.docker.com/r/araffin/rl-baselines-zoo
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
The doc is already updated ... cf https://stable-baselines.readthedocs.io/en/master/guide/install.html#using-docker-images " If you are looking for docker images with stable-baselines already installed in it, we recommend using images from RL Baselines Zoo.
Otherwise, the following images contained all the dependencies for stable-baselines but not the stable-baselines package itself. They are made for development. "
Hi!
Have twice now tried to run this on Ubuntu 18 desktop, two different installations, once natively, once with Docker (run_docker_gpu.sh). The image I’m using is araffin/rl-baselines-zoo. With both installations I have this issue:
Fatal server error: (EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE) ++ seq 1 10
and so forth (see below).
On the first installation, I thought I’d removed some lock files before. I’ve scoured the web for solutions to this issue, haven’t found anything. Would appreciate any ideas on how to address this.
Kind regards
REPOSITORY TAG IMAGE ID CREATED SIZE araffin/rl-baselines-zoo latest c799b5127cf3 9 days ago 3.85GB nvidia/cuda 9.0-base 74f5aea45cf6 2 months ago 134MB
sudo bash run_docker_gpu.sh python train.py --algo ppo2 --env CartPole-v1 Executing in the docker (gpu image): python train.py --algo ppo2 --env CartPole-v1
lshw WARNING: you should run this program as super-user. ub-desk description: Computer width: 64 bits capabilities: smp vsyscall32 -core description: Motherboard physical id: 0 -memory description: System memory physical id: 0 size: 47GiB *-cpu product: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz vendor: Intel Corp. physical id: 1 bus info: cpu@0 size: 1199MHz capacity: 3800MHz width: 64 bits
On 23 January 2019 at 20:48:33, Antonin RAFFIN (notifications@github.com) wrote:
The doc is already updated ... cf https://stable-baselines.readthedocs.io/en/master/guide/install.html#using-docker-images " If you are looking for docker images with stable-baselines already installed in it, we recommend using images from RL Baselines Zoo.
Otherwise, the following images contained all the dependencies for stable-baselines but not the stable-baselines package itself. They are made for development. "
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Ok, did you try the cpu image?
If it does not work with the cpu image, I'm afraid the problem may come from your machine, because the cpu image is tested at each push on Travic CI.
What you are seeing is the entrypoint.sh
trying to create a fake X server in order to be able to launch any env that requires one.
Btw, why do you have to use sudo? Did you follow the post-installation?
Hi!
Thanks for your speedy answer!
Tried the cpu image, same error.
Thanks for the hint about the post installation, did that.
So, it must be something with my system. Will have to figure that out.
Kind regards.
On 27 January 2019 at 18:26:21, Antonin RAFFIN (notifications@github.com) wrote:
Ok, did you try the cpu image? If it does not work with the cpu image, I'm afraid the problem may come from your machine, because the cpu image is tested at each push on Travic CI. What you are seeing is the entrypoint.sh trying to create a fake X server in order to be able to launch any env that requires one. Btw, why do you have to use sudo? Did you follow the post-installation?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi again.
I modified entrypoint.sh, rebuilt the GPU image, ran the container:
ee@ub-desk:~/Desktop/docker$ bash run_docker_gpu.sh python train.py --algo ppo2 --env CartPole-v1
Executing in the docker (gpu image):
python train.py --algo ppo2 --env CartPole-v1
Traceback (most recent call last):
File "train.py", line 11, in
Pretty sure this is an error in the code, unrelated to the fake X server issue.
Do you have any suggestions?
Kind regards
On 27 January 2019 at 22:37:24, Bjørn A. Helland-Hansen (bjornprivate@runbox.com) wrote:
Hi!
Thanks for your speedy answer!
Tried the cpu image, same error.
Thanks for the hint about the post installation, did that.
So, it must be something with my system. Will have to figure that out.
Kind regards.
On 27 January 2019 at 18:26:21, Antonin RAFFIN (notifications@github.com) wrote:
Ok, did you try the cpu image? If it does not work with the cpu image, I'm afraid the problem may come from your machine, because the cpu image is tested at each push on Travic CI. What you are seeing is the entrypoint.sh trying to create a fake X server in order to be able to launch any env that requires one. Btw, why do you have to use sudo? Did you follow the post-installation?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I edited the entrypoint.sh to not try and make a fake X server. Then I can build and run. I dont think the errors in the previous post are due to cartpole trying to display something, it
s in the code. Might it be an issue with the version of Tensorflow used?
Get:332 http://archive.ubuntu.com/ubuntu xenial/universe amd64 libopenmpi-dev amd64 1.10.2-8ubuntu1 [537 kB]
**debconf: delaying package configuration, since apt-utils is not installed**
Fetched 225 MB in 7min 7s (527 kB/s)
Successfully installed virtualenv-16.3.0
**You are using pip version 8.1.1, however version 19.0.1 is available.**
You should consider upgrading via the 'pip install --upgrade pip' command.
Using base prefix '/usr'
New python executable in /root
**Collecting joblib (from stable-baselines==2.4.0)
Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/joblib/**
Downloading https://files.pythonhosted.org/packages/49/d9/4ea194a4c1d0148f9446054b9135f47218c23ccc6f649aeb09fab4c0925c/joblib-0.13.1-py2.py3-none-any.whl (278kB)
Successfully built html5lib
**tensorflow 1.12.0 has requirement tensorboard<1.13.0,>=1.12.0, but you'll have tensorboard 1.8.0 which is incompatible.**
Installing collected packages: html5lib, bleach, tensorboard, tensorflow-gpu
So docker build gave some warnings, but for some reason built the image anyway. I`m not sure that explains the issues in the previous entry or not.
Now, every time I try to build a new Docker image, it just uses local files. Not sure how I can force it to redo from download, or if that has any merit at all.
Have rl-baselines-zoo, GPU edition, pulled, not built.
Trying to run:
docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/root/code/stable-baselines,type=bind araffin/stable-baselines bash -c 'cd /root/code/stable-baselines/ && pytest tests/'
Am running:
sudo docker run --runtime=nvidia -it araffin/stable-baselines bash
Traversing into /root/code/, the directory is empty. It seems there is something wrong about the repository. Similar issues with the rl-zoo image.
I have little experience with docker, so I might well have missed something.
Kind regards