floydhub / dockerfiles

Deep Learning Dockerfiles
https://docs.floydhub.com/guides/environments/
Apache License 2.0
156 stars 57 forks source link

vnc for openAI gym #31

Open AwokeKnowing opened 7 years ago

AwokeKnowing commented 7 years ago

openAI gym is working in the pytorch image, but to actually run the atari environments, it fails due to lack of x11. What's the best way to get it to where we can see the environments. I tried adding a desktop with vnc but it messed up the gpu support.

in my attempts strangely enough the atari environments would run when I run the container with docker, but when I run the container with nvidia-docker (gpu enabled), the gpu is definately working (torch.cuda.is_available()) but the atari games error out on

File "/usr/local/lib/python3.5/site-packages/pyglet/gl/glx_info.py", line 83, in have_version
    raise GLXInfoException('pyglet requires an X server with GLX')
pyglet.gl.glx_info.GLXInfoException: pyglet requires an X server with GLX

to be clear, I have an Nvidia GPU. my goal is to work with pytorch and openai gym. your pytorch image is perfect, except that I need to see the atari environments. I don't know how to see them but have been trying for 16 hours. I can follow instructions, but I can't seem to find instructions.

houqp commented 7 years ago

Could you send us the floydhub job URL so I can try to reproduce the error on my side?

AwokeKnowing commented 7 years ago

Hi I'm actually running this locally. My goal is to be able to seemlessly transition from docker local to docker on floydhub or elsewhere.

I have seen some people say that the trick is to install my nvidia drivers with option --no-opengl-files https://davidsanwald.github.io/2016/11/13/building-tensorflow-with-gpu-support.html

So again, all I'm doing is essentially nvidia-docker run -it -p5959:5900 floydhub/pytorch:0.1.11-gpu-py3.6 bash . There I do see my two GPU's when running nvidia-smi. But when running a simple OpenAI gym (pong) environment with xvfb I get the above error.

Note, I build this entire machine from scratch precisely for working within Docker locally on OpenAi gym and pytorch, so I'm willing to do whatever up to including formatting the whole machine. I just need a reliable setup and floydhub/pytorch has everything except actually a way to run the environments. Again note that the environment do actually run with I just run docker floydhub/pytorch (vs nvidia-docker) but no GPU is found

houqp commented 7 years ago

Could you create a minimal reproducible example on Floydhub? That can help reduce the time for us to trace down the root cause if we can reliably reproduce it on our side.

AwokeKnowing commented 7 years ago

@houqp yesterday I re-installed the ubuntu from scratch, updated packages, blacklisted nouveau, installed build-essentials, and installed the nvidia runfile driver (375.86), with --no-opengl-files and saying "no" to any offers to modify xconfig. Then I installed docker and nvidia-docker and ran floydhub/pytorch, and the nvidia drivers are loaded and no errors when running against xvfb.

So the error part of this issue can be pinned on ubuntu ppa nvidia driver (not having option to install without messing up opengl).

However, now that everything is working correctly, the title of this issue comes into play. Since openAI gym is a graphical environment, what is the recommended way to "see" the environments rendered when running floydhub/pytorch locally (ie with a Monitor). Does it have vnc installed? How to run it? How to connect from host OS?

Obviously I can make my own image on top of this, but is there already a vnc setup in the image?

What I suggest (but I defer to your experience) is to have a script similar to your run_jupyter.sh which would be run_vnc.sh and expose 5999 or whatever. It's not necessary to install any "desktop" or x server, xvfb will do fine. (at least for gym) so just a vncserver to optionally "see" what's going on.

You know what would be really great is if someone figured out how to render live atari games within jupyter.../offtopic

AwokeKnowing commented 7 years ago

Note: here I posted a question describing the exact setup (just 2 commands on top of floydhub/pytorch) https://askubuntu.com/questions/942768/how-can-i-properly-run-openai-gym-with-nvidia-docker-and-see-the-environments

You can see that the rendering is wrong. Which underscores the need for floydhub to have that "already set up" in the image.

I would truly appreciate if someone with experience in installing all the packages included this.

I suggest a new image floydhub/pytorch-gui FROM: floydhub/pytorch [stuff to run ai Gym, matplotlib etc over vnc]

Then we could just use the image, connect over vnc and get right to work, as floydhub's goal. It's crazy that I've spent 12 hours just trying to get a visual of openAI gym (and still not working), and it's sad that each developer would need to do the same, before starting working on deep learning code. Not everything is web-based, or console-based, so vnc is needed

houqp commented 7 years ago

I think it would be useful to add vnc streaming support to base image so all envrionmens will have access to it. I noticed you also installed fluxbox, is that a hard dependency?

Sorry that we are working on releasing a major update to the platform, so I won't have enough time to work on this in the upcoming weeks. That said, you are welcome to submit a patch to the dockerfile at https://github.com/floydhub/dockerfiles/blob/master/dl/dl-base/dl-base-1.x.x.jinja. I can help test and build the new image for you.

AwokeKnowing commented 7 years ago

@houqp fluxbox is not a hard dependency. I just found some example that used it. I have posted on several forums including askubuntu and devtalk.nvidia.com regarding it, but have not yet found the best way. I supposed a very lightweight desktop in the base might be the best way.

houqp commented 7 years ago

I have a feeling that there is still space for optimization, we can probably get away without a desktop environment.

AwokeKnowing commented 7 years ago

@houqp So I got sidetracked from this project doing the Coursera DeepLearning Specialization, but when that's done I really want to nail the stack for local, docker-based OpenAI gym development (especially atari).

You're probably right that a full desktop is overkill. We just need working VNC of the gym environment(s) and any plot windows as well. I'm not familiar enough to know the best-practices way to do that.

houqp commented 7 years ago

Sounds good, looking forward to work with you on that when you finish the class. This will benefit a lot of people :)

AwokeKnowing commented 7 years ago

So based on the info at openAI for universe, I did the following:

 nvidia-docker run --privileged --rm -it \
 -e DOCKER_NET_HOST=172.17.0.1 \
 -v /var/run/docker.sock:/var/run/docker.sock \
 floydhub/pytorch:0.1.11-gpu-py3.6 bash

it seems to be important to map the docker socket in there for universe to start it's own docker containers.

With this setup I can run the environments and they appear in vnc localhost:5900 or web localhost:15900 (they have a built in web-based vnc client).

as an example, when running the above container, then I can write the below example code in a python script and run it, and view it from my machine at that web address.

import gym
import universe  # register the universe environments

env = gym.make('gym-core.Pong-v3')
env.configure(remotes=1)  # automatically creates a local docker container
observation_n = env.reset()

while True:
  action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]  # your agent here
  observation_n, reward_n, done_n, info = env.step(action_n)
  env.render()

image

However, this is all for the 'universe' version of the atari environments, not the native gym one. The difference is that the universe ones are wrapped in docker and the observation comes over vnc, so the frame rate is capped and there is network overhead and so training is slower, and the reward signal has lag. So I still would like to be able to do the same with the native gym atari environments, which are also already installed in the floydhub/pytorch image.

But this is a good start. Pytorch + atari in one floydhub image and simple command line, without installing anything else.

AwokeKnowing commented 7 years ago

Ok, so I now believe the best way for local development will be x forwarding. No need to install anything else in the image.

However, I believe something in the floydhub stack has corrupted the normal rendering. Some other environments work (the flash ones) but all the atari ones render with corrupted display.

To reproduce, run:

docker run -it --user=$(id -u) --env="DISPLAY" --workdir="/home/$USER" \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" floydhub/pytorch:0.1.11-gpu-py3.6 bash

Now in the image, type python and then the following:

import gym
gym.make('Pong-v0').render()

That should open up an x-forwarded window on your machine, but the display is corrupt (at least for me) image

Above I actually used SpaceInvaders-v0

So, the point is that there is no need to install a 'desktop' or vnc or anything. x-forwarding can allow local development viewing of the windows just fine. However, something with openGL or something is messed up in the floydhub/pytorch image.

Please try the above and let me know if you can reproduce it. If it works fine for you then I guess something is wrong with my drivers.

houqp commented 7 years ago

This is awesome, I am going to look into it today.

houqp commented 7 years ago

I am able to reproduce the garbled screen on my laptop. One thing I noticed though is CartPole-v0 is working fine. So this is not happening to all the games.

EDIT: Sorry, I missed a detail in your previous comment. You were trying to get atari game working. Looks like all atari games are not working properly.

houqp commented 7 years ago

On the other hand, i am able to save content of screen as image locally with correct content:

env = gym.make('Pong-v0') 
env.reset()
env.env.ale.saveScreenPNG('test_image.png')
AwokeKnowing commented 7 years ago

in the ALE manual, I found this: image

So ALE's default is not RGB but palette. I'm thinking maybe somehow the pallets values are getting sent to the 'observation' that gym gets, instead of rgb pixels. and gym is passing the palette values to the VNC render as if they were RGB values. That would explain why each line seems to repeat about 3 times and the colors are wrong.

I wonder if gym or ALE are using different 'default' settings when they run in docker vs normal install.

houqp commented 7 years ago

Yeah, I think the next step is to skip the gym abstraction and troubleshoot this with vanilla ALE.

sagelywizard commented 6 years ago

Any update? Running into the same issue.

houqp commented 6 years ago

Unfortunately, I haven't had time to dig into it. @sagelywizard are you interested in helping?

AwokeKnowing commented 6 years ago

@houqp @sagelywizard I haven't had a chance to get back to this as I've been doing the 3rd and 4th courses in the Andrew Ng Deep Learning Specialization in coursera.

However, I did actually try installing gym on the official pytorch image, and that worked with no graphics glitches. So I don't think the solution is in troubleshooting ALE directly, but more like divide and conquer removing other dependencies in the image.

see this thread: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/issues/9

But it confirms we shouldn't need any 'desktop' or anything special to get it working. It will be really nice to have that working on these floydhub images because they are so well organized and have the other needed tools already built in.

houqp commented 6 years ago

@AwokeKnowing this is very useful info, thanks! We can start with https://hub.docker.com/r/floydhub/dl-base image and see if the glitches still happens at the base layer.

AwokeKnowing commented 6 years ago

@houqp So I finished the DeepLearning specialization and looked back at this issue. Now I have found the sad reality. It turns out that the openai universe team seems to not be working so closely with the open ai gym team (or maybe they abandoned universe and are all working on gym).

It turns out that universe requires a very old version of gym. As in there have been a number of breaking changes since then. So if you update gym, then the graphics issue is completely fixed and everything works fine with x-forwarding the atari gym env. However, updating gym breaks universe. universe is not compatible with the updated gym.

I'm not sure how you would like to handle that. It turns out there's more work on gym than on 'universe' mostly I guess because 'universe' is a much harder problem due to lag and it's more resource intensive.

So one solution would be to install the new gym and leave universe out. then you could have a separate universe image (perhaps on top of the pytorch image which has about everything).

or possibly you could figure out to do it in a virtualenv and perhaps have an -e GYM=universe environment flag which somehow selects whether you want gym to be the current gym or the 'very' old gym for universe.

but overall, I believe that at this point having up-to-date gym is more inline with current trends. perhaps (or perhaps not) they will eventually refactor universe, but my expectation from what I have read is that it would probably be a significant refactor and is not coming soon.

Edit: In fact, it seems universe is not going to be further developed: https://github.com/openai/universe/issues/218

So I'm officially suggesting you just remove it and put the current gym. Something like:

git clone https://github.com/openai/gym.git
cd gym
pip install -e .[classic_control, atari, box2d]

Edit: so, for testing I basically concatenated all the pytorch parent dockerfiles into one long dockerfile. then I based it on the new cudagl image for opengl support. on that image, gym works fine by just installing it. It's unclear exactly why the floydhub pytorch image doesn't work to just install gym. all I can think of is that it's because of opengl.

for reference, here's the Dockerfile I used. it basically runs through all the same stuff as the floydhub/pytorch and then adds gym at the end you can run it like this (on a local machine with nvidia):

docker run --runtime=nvidia -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix awokeknowing/aitools

FROM nvidia/cudagl:9.0-devel-ubuntu16.04
LABEL maintainer "jamesdavidmorris@gmail.com"

####dependencies

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        sudo \
        ca-certificates \
        curl \
        wget \
        bzr \
        git \
        mercurial \
        openssh-client \
        subversion \
        procps \
        autoconf \
        automake \
        bzip2 \
        file \
        g++ \
        gcc \
        imagemagick \
        libbz2-dev \
        libc6-dev \
        libcurl4-openssl-dev \
        libdb-dev \
        libevent-dev \
        libffi-dev \
        libgdbm-dev \
        libgeoip-dev \
        libglib2.0-dev \
        libjpeg-dev \
        libkrb5-dev \
        liblzma-dev \
        libmagickcore-dev \
        libmagickwand-dev \
        libncurses-dev \
        libpng-dev \
        libpq-dev \
        libreadline-dev \
        libsqlite3-dev \
        libssl-dev \
        libtool \
        libwebp-dev \
        libxml2-dev \
        libxslt-dev \
        libyaml-dev \
        make \
        patch \
        xz-utils \
        zlib1g-dev \
        # https://lists.debian.org/debian-devel-announce/2016/09/msg00000.html
        $( \
        # if we use just "apt-cache show" here, it returns zero because "Can't select versions from package 'libmysqlclient-dev' as it is purely virtual", hence the pipe to grep
            if apt-cache show 'default-libmysqlclient-dev' 2>/dev/null | grep -q '^Version:'; then \
            echo 'default-libmysqlclient-dev'; \
            else \
            echo 'libmysqlclient-dev'; \
            fi \
        ) \
    && rm -rf /var/lib/apt/lists/*

# This updates the global environment for the root user
RUN echo "LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH" >> /etc/environment
RUN echo "LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LIBRARY_PATH" >> /etc/environment

################ python
# From https://github.com/docker-library/python/blob/master/3.6./Dockerfile

# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# runtime dependencies
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        tcl \
        tk \
    && rm -rf /var/lib/apt/lists/*

ENV GPG_KEY 0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D
ENV PYTHON_VERSION 3.6.2

RUN set -ex \
    && buildDeps=' \
        dpkg-dev \
        tcl-dev \
        tk-dev \
        wget \
        ca-certificates \
    ' \
    && apt-get update \
        && apt-get install -y $buildDeps --no-install-recommends \
        && rm -rf /var/lib/apt/lists/* \
    \
    && wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" \
    && wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" \
    && export GNUPGHOME="$(mktemp -d)" \
    && gpg --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" \
    && gpg --batch --verify python.tar.xz.asc python.tar.xz \
    && rm -r "$GNUPGHOME" python.tar.xz.asc \
    && mkdir -p /usr/src/python \
    && tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \
    && rm python.tar.xz \
    \
    && cd /usr/src/python \
    && gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
    && ./configure \
        --build="$gnuArch" \
        --enable-loadable-sqlite-extensions \
        --enable-shared \
        --with-system-expat \
        --with-system-ffi \
        --without-ensurepip \
    && make -j$(nproc) \
    && make install \
    && ldconfig \
    && find /usr/local -depth \
        \( \
            \( -type d -a -name test -o -name tests \) \
            -o \
            \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
        \) -exec rm -rf '{}' + \
    && rm -rf /usr/src/python ~/.cache
# make some useful symlinks that are expected to exist
RUN cd /usr/local/bin \
    && { [ -e easy_install ] || ln -s easy_install-* easy_install; } \
    && ln -s idle3 idle \
    && ln -s pydoc3 pydoc \
    && ln -s python3 python \
    && ln -s python3-config python-config

# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
ENV PYTHON_PIP_VERSION 9.0.1

RUN set -ex; \
    \
    wget -O get-pip.py 'https://bootstrap.pypa.io/get-pip.py'; \
    \
    python get-pip.py \
        --disable-pip-version-check \
        --no-cache-dir \
        "pip==$PYTHON_PIP_VERSION" \
    ; \
    pip --version; \
    \
    find /usr/local -depth \
        \( \
            \( -type d -a \( -name test -o -name tests \) \) \
            -o \
            \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
        \) -exec rm -rf '{}' +; \
    rm -f get-pip.py

# install "virtualenv", since the vast majority of users of this image will want it
RUN pip install --no-cache-dir virtualenv

##################dependencies

# Add Bazel distribution URI as a package source
RUN echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list \
    && curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

# Install some dependencies
RUN apt-get update && apt-get install -y \
        ant \
        apt-utils \
        bazel \
        bc \
        build-essential \
        cmake \
        default-jdk \
        doxygen \
        gfortran \
        golang \
        iptables \
        libav-tools \
        libboost-all-dev \
        libeigen3-dev \
        libfreetype6-dev \
        libhdf5-dev \
        libjpeg-turbo8-dev \
        liblcms2-dev \
        libopenblas-dev \
        liblapack-dev \
        libpng12-dev \
        libprotobuf-dev \
        libsdl2-dev \
        libpython3-dev \
        libtiff-dev \
        libtiff5-dev \
        libvncserver-dev \
        libzmq3-dev \
        nano \
        net-tools \
        openmpi-bin \
        pkg-config \
        protobuf-compiler \
        python3-dev \
        python3-opengl \
        python3-tk \
        python-software-properties \
        rsync \
        software-properties-common \
        swig \
        unzip \
        vim \
        webp \
        xorg-dev \
        xvfb \
    && apt-get clean \
    && apt-get autoremove \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /var/cache/apt/archives/* \
# Link BLAS library to use OpenBLAS using the alternatives mechanism (https://www.scipy.org/scipylib/building/linux.html#debian-ubuntu)
    && update-alternatives --set libblas.so.3 /usr/lib/openblas-base/libblas.so.3

# Install Git LFS
RUN apt-get update \
    && add-apt-repository ppa:git-core/ppa \
    && curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash && \
    apt-get install -y git-lfs \
    && git lfs install \
    && apt-get clean \
    && apt-get autoremove \
    && rm -rf /var/cache/apt/archives/* \
    && rm -rf /var/lib/apt/lists/*

RUN apt-get clean && apt-get update && apt-get install -y \
        build-essential \
        cmake \
        gcc \
        apt-utils \
        pkg-config \
        make \
        nasm \
        wget \
        unzip \
        git \
        ca-certificates \
        curl \
        vim \
        nano \
        python3 \
        python3-pip \
        python3-dev \
        python3-numpy \
        gfortran \
        libatlas-base-dev \
        libatlas-dev \
        libatlas3-base \
        libhdf5-dev \
        libfreetype6-dev \
        libjpeg-dev \
        libpng-dev \
        libtiff-dev \
        libxml2-dev \
        libxslt-dev \
        libav-tools \
        libavcodec-dev \
        libavformat-dev \
        libxvidcore-dev \
        libx264-dev \
        x264 \
        libdc1394-22-dev \
        libswscale-dev \
        libv4l-dev \
        libsdl2-dev \
        swig \
        libboost-program-options-dev \
        libboost-all-dev \
        libboost-python-dev \
        zlib1g-dev \
        libjasper-dev \
        libtbb2 \
        libtbb-dev \
        libgl1-mesa-glx \
        qt5-default \
        libqt5opengl5-dev \
        xvfb \
        xorg-dev \
        x11-apps \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# add ffmpeg with cuda support

RUN git clone --depth 1 --branch n3.4.1 https://github.com/ffmpeg/ffmpeg ffmpeg && \
    cd ffmpeg && \
    ./configure --enable-cuda --enable-cuvid --enable-nvenc --enable-nonfree --enable-libnpp \
                --extra-cflags=-I/usr/local/cuda/include \
                --extra-ldflags=-L/usr/local/cuda/lib64 \
                --prefix=/usr/local/ffmpeg --enable-shared --disable-static \
                --disable-manpages --disable-doc --disable-podpages && \
                make -j"$(nproc)" install && \
                ldconfig

ENV NVIDIA_DRIVER_CAPABILITIES $NVIDIA_DRIVER_CAPABILITIES,video

RUN apt-get update && apt-get install -y --no-install-recommends \
        cuda-npp-9-0 && \
    rm -rf /var/lib/apt/lists/*

#ENTRYPOINT ["ffmpeg"]

#WORKDIR /tmp
#CMD ["-y", "-hwaccel", "cuvid", "-c:v", "h264_cuvid", "-vsync", "0", "-i", \
#     "http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4", \
#     "-vf", "scale_npp=1280:720", "-vcodec", "h264_nvenc", "-t", "00:02:00", "output.mp4"]
#

RUN pip --no-cache-dir install \
        Cython \
        h5py \
        ipykernel \
        jupyter \
        matplotlib \
        numpy \
        cupy \
        pandas \
        path.py \
        pyyaml \
        scipy \
        six \
        sklearn \
        sympy \
        Pillow \
        zmq \
        && \
    python -m ipykernel.kernelspec

# Set up our notebook config.
COPY jupyter_notebook_config_py3.py /root/.jupyter/
RUN mv /root/.jupyter/jupyter_notebook_config_py3.py /root/.jupyter/jupyter_notebook_config.py

# Jupyter has issues with being run directly:
#   https://github.com/ipython/ipython/issues/7062
# We just add a little wrapper script.
COPY run_jupyter.sh /
RUN chmod +x /run_jupyter.sh

# IPython
EXPOSE 8888

############# opencv
ARG OPENCV_VERSION=3.4.0

RUN apt-get update && apt-get install -y \
        python-opencv \
        libavcodec-dev \
        libavformat-dev \
        libav-tools \
        libavresample-dev \
        libdc1394-22-dev \
        libgdal-dev \
        libgphoto2-dev \
        libgtk2.0-dev \
        libjasper-dev \
        liblapacke-dev \
        libopencore-amrnb-dev \
        libopencore-amrwb-dev \
        libopencv-dev \
        libopenexr-dev \
        libswscale-dev \
        libtbb2 \
        libtbb-dev \
        libtheora-dev \
        libv4l-dev \
        libvorbis-dev \
        libvtk6-dev \
        libx264-dev \
        libxine2-dev \
        libxvidcore-dev \
        qt5-default \
        && \
    apt-get clean && \
    apt-get autoremove && \
    rm -rf /var/lib/apt/lists/*

RUN cd ~/ && \
    git clone https://github.com/Itseez/opencv.git --branch ${OPENCV_VERSION} --single-branch && \
    git clone https://github.com/Itseez/opencv_contrib.git --branch ${OPENCV_VERSION} --single-branch && \
    cd opencv && \
    mkdir build && \
    cd build && \
    cmake -D CMAKE_BUILD_TYPE=RELEASE \
        -DWITH_QT=ON \
        -DWITH_OPENGL=ON \
        -D WITH_CUDA=ON \
        -D CUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so \
        -D ENABLE_FAST_MATH=1 \
        -D CUDA_FAST_MATH=1 \
        -D WITH_CUBLAS=1 \
        -DFORCE_VTK=ON \
        -DWITH_TBB=ON \
        -DWITH_GDAL=ON \
        -DWITH_XINE=ON \
        -DBUILD_EXAMPLES=ON \
        -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
        .. && \
    make -j"$(nproc)" && \
    make install && \
    ldconfig && \
 # Remove the opencv folders to reduce image size
    rm -rf ~/opencv*

#### open ai

# Add Tensorboard
RUN apt-get update && apt-get install -y supervisor \
  && apt-get clean \
  && apt-get autoremove \
  && rm -rf /var/cache/apt/archives/* \
  && rm -rf /var/lib/apt/lists/*
COPY tensorboard.conf /etc/supervisor/conf.d/

# graphviz for visualization
RUN apt-get update && apt-get install -y \
        graphviz \
    && apt-get clean \
    && apt-get autoremove \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /var/cache/apt/archives/*

RUN pip --no-cache-dir install \
        pydot \
        dlib \
        incremental \
        nltk \
        gym[atari,box2d,classic_control] \
        textacy \
        scikit-image \
        spacy \
        tqdm \
        wheel \
        kaggle-cli \
        annoy \
    && rm -rf /tmp/* /var/tmp/*

# Install OpenAI Universe
RUN git clone --branch v0.21.3 https://github.com/openai/universe.git \
    && cd universe \
    && pip install . \
    && cd .. \
    && rm -rf universe

# Install xgboost
RUN git clone --recursive https://github.com/dmlc/xgboost \
    && cd xgboost \
    && mkdir build \
    && cd build \
    && cmake .. -DUSE_CUDA=ON \
    && make -j$(nproc) \
    && cd .. \
    && cd python-package \
    && python setup.py install \
    && cd ../.. \
    && rm -rf xgboost

    ############## tensorflow and keras
RUN pip --no-cache-dir install tf-nightly-gpu

# Install Keras and tflearn
RUN pip --no-cache-dir install git+git://github.com/fchollet/keras.git@${KERAS_VERSION} \
        tflearn==0.3.2 \
    && rm -rf /pip_pkg \
    && rm -rf /tmp/* \
    && rm -rf /root/.cache

##### pytorch
RUN pip --no-cache-dir install --upgrade http://download.pytorch.org/whl/cu90/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl \
    tensorboardX \
    torchvision==0.2.0

#### OpenAI gym
RUN apt-get update && apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig libgtk2.0-dev && git clone https://github.com/openai/gym.git && cd gym && pip install -e '.[classic_control,box2d,atari]' 

RUN printf "import gym\nenv = gym.make(\"SpaceInvaders-v0\")\nenv.reset()\nfor i in range(900):\n  env.step(env.action_space.sample())\n  env.render()\ninput()" >> testgym.py

CMD python testgym.py

So please try it at:

xhost +si:localuser:root
docker run --runtime=nvidia -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix awokeknowing/aitools

That should pop up like:

invaders-good

houqp commented 6 years ago

Thanks @AwokeKnowing for digging into the root cause! I agree with you. If universe is not active anymore, we should exclude from the default package list. We are about to rebuild all images for a newer version of cuda. Will incorporate your change with the new release.

houqp commented 6 years ago

@AwokeKnowing we released a new set of images (pythorch-0.3.1 and tensorflow-1.7) last week with universe removed. Do you mind give it a try? I will test this on a linux machine with GUI environment later today.