floydhub / dl-docker

An all-in-one Docker image for deep learning. Contains all the popular DL frameworks (TensorFlow, Theano, Torch, Caffe, etc.)
https://www.floydhub.com
3.86k stars 821 forks source link

Fixed some issues with GPU Dockerfile #73

Open pbamotra opened 7 years ago

pbamotra commented 7 years ago

Fixes: cuDNN6, libcudnn.so.6 issue, #59, latest version of deep learning libraries, pandas, sklearn upgrade, added some of my favorite python libraries as well

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04

LABEL authors="Sai Soundararaj <saip@outlook.com>, Pankesh Bamotra <pankesh.bamotra@gmail.com>"

ARG THEANO_VERSION=rel-0.9.0
ARG TENSORFLOW_VERSION=1.3.0
ARG TENSORFLOW_ARCH=gpu
ARG KERAS_VERSION=2.0.8
ARG LASAGNE_VERSION=v0.1
ARG TORCH_VERSION=latest
ARG CAFFE_VERSION=master
ARG CUDNN_TAR_FILE=cudnn-8.0-linux-x64-v6.0.tgz

#RUN echo -e "\n**********************\nNVIDIA Driver Version\n**********************\n" && \
#   cat /proc/driver/nvidia/version && \
#   echo -e "\n**********************\nCUDA Version\n**********************\n" && \
#   nvcc -V && \
#   echo -e "\n\nBuilding your Deep Learning Docker Image...\n"

# Install some dependencies
RUN apt-get update && apt-get install -y \
        bc \
        build-essential \
        cmake \
        curl \
        g++ \
        gfortran \
        git \
        libffi-dev \
        libfreetype6-dev \
        libhdf5-dev \
        libjpeg-dev \
        liblcms2-dev \
        libopenblas-dev \
        liblapack-dev \
        libopenjpeg2 \
        libpng12-dev \
        libssl-dev \
        libtiff5-dev \
        libwebp-dev \
        libzmq3-dev \
        nano \
        pkg-config \
        python-dev \
        software-properties-common \
        unzip \
        vim \
        wget \
        zlib1g-dev \
        qt5-default \
        libvtk6-dev \
        zlib1g-dev \
        libjpeg-dev \
        libwebp-dev \
        libpng-dev \
        libtiff5-dev \
        libjasper-dev \
        libopenexr-dev \
        libgdal-dev \
        libdc1394-22-dev \
        libavcodec-dev \
        libavformat-dev \
        libswscale-dev \
        libtheora-dev \
        libvorbis-dev \
        libxvidcore-dev \
        libx264-dev \
        yasm \
        libopencore-amrnb-dev \
        libopencore-amrwb-dev \
        libv4l-dev \
        libxine2-dev \
        libtbb-dev \
        libeigen3-dev \
        python-dev \
        python-tk \
        python-numpy \
        python3-dev \
        python3-tk \
        python3-numpy \
        ant \
        default-jdk \
        doxygen \
        && \
    apt-get clean && \
    apt-get autoremove && \
    rm -rf /var/lib/apt/lists/* && \
# Link BLAS library to use OpenBLAS using the alternatives mechanism (https://www.scipy.org/scipylib/building/linux.html#debian-ubuntu)
    update-alternatives --set libblas.so.3 /usr/lib/openblas-base/libblas.so.3

# Install cuDNN v6.0
RUN wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE} -P /root/downloads && \
    cd /root/downloads && \
    tar -xzvf ${CUDNN_TAR_FILE}

ADD cuda/include/cudnn.h /usr/local/cuda-8.0/include
ADD cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/
RUN chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*

ENV CUDA_HOME=/usr/local/cuda
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 

RUN cd /usr/local/cuda/lib64 && \
    rm libcudnn.so && \
    rm libcudnn.so.6 && \
    ln libcudnn.so.6.* libcudnn.so.6 && \
    ln libcudnn.so.6 libcudnn.so && \
    ldconfig

# Install pip
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
    python get-pip.py && \
    rm get-pip.py

# Add SNI support to Python
RUN pip --no-cache-dir install \
        pyopenssl \
        ndg-httpsclient \
        pyasn1

# Install useful Python packages using apt-get to avoid version incompatibilities with Tensorflow binary
# especially numpy, scipy, skimage and sklearn (see https://github.com/tensorflow/tensorflow/issues/2034)
RUN apt-get update && apt-get install -y \
        python-numpy \
        python-scipy \
        python-nose \
        python-h5py \
        python-skimage \
        python-matplotlib \
        python-pandas \
        python-sklearn \
        python-sympy \
        && \
    apt-get clean && \
    apt-get autoremove && \
    rm -rf /var/lib/apt/lists/*

# Install other useful Python packages using pip
RUN pip --no-cache-dir install --upgrade ipython pandas sklearn && \
    pip --no-cache-dir install \
        Cython \
        click \
        grequests \
        h5py \
        python-dotenv \
        sqlalchemy-redshift \
        gevent \
        awscli \
        ipykernel \
        jupyter \
        path.py \
        Pillow \
        pygments \
        six \
        sphinx \
        wheel \
        zmq \
        && \
    python -m ipykernel.kernelspec

# Install TensorFlow
RUN pip --no-cache-dir install \
    https://storage.googleapis.com/tensorflow/linux/${TENSORFLOW_ARCH}/tensorflow_${TENSORFLOW_ARCH}-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl

# Install dependencies for Caffe
RUN apt-get update && apt-get install -y \
        libboost-all-dev \
        libgflags-dev \
        libgoogle-glog-dev \
        libhdf5-serial-dev \
        libleveldb-dev \
        liblmdb-dev \
        libopencv-dev \
        libprotobuf-dev \
        libsnappy-dev \
        protobuf-compiler \
        && \
    apt-get clean && \
    apt-get autoremove && \
    rm -rf /var/lib/apt/lists/*

# Install Caffe
RUN git clone -b ${CAFFE_VERSION} --depth 1 https://github.com/BVLC/caffe.git /root/caffe && \
    cd /root/caffe && \
    cat python/requirements.txt | xargs -n1 pip install && \
    mkdir build && cd build && \
    cmake -DUSE_CUDNN=1 -DBLAS=Open .. && \
    make -j"$(nproc)" all && \
    make install

# Set up Caffe environment variables
ENV CAFFE_ROOT=/root/caffe
ENV PYCAFFE_ROOT=$CAFFE_ROOT/python
ENV PYTHONPATH=$PYCAFFE_ROOT:$PYTHONPATH \
    PATH=$CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH

RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig

# Install Theano and set up Theano config (.theanorc) for CUDA and OpenBLAS
RUN pip --no-cache-dir install git+git://github.com/Theano/Theano.git@${THEANO_VERSION} && \
    \
    echo "[global]\ndevice=gpu\nfloatX=float32\noptimizer_including=cudnn\nmode=FAST_RUN \
        \n[lib]\ncnmem=0.95 \
        \n[nvcc]\nfastmath=True \
        \n[blas]\nldflag = -L/usr/lib/openblas-base -lopenblas \
        \n[DebugMode]\ncheck_finite=1" \
    > /root/.theanorc

# Install Keras
RUN pip --no-cache-dir install git+git://github.com/fchollet/keras.git@${KERAS_VERSION}

# Install Lasagne
RUN pip --no-cache-dir install git+git://github.com/Lasagne/Lasagne.git@${LASAGNE_VERSION}

# Install Torch
RUN git clone https://github.com/torch/distro.git /root/torch --recursive && \
    cd /root/torch && \
    bash install-deps && \
    yes no | ./install.sh

# Export the LUA evironment variables manually
ENV LUA_PATH='/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua' \
    LUA_CPATH='/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so' \
    PATH=/root/torch/install/bin:$PATH \
    LD_LIBRARY_PATH=/root/torch/install/lib:$LD_LIBRARY_PATH \
    DYLD_LIBRARY_PATH=/root/torch/install/lib:$DYLD_LIBRARY_PATH
ENV LUA_CPATH='/root/torch/install/lib/?.so;'$LUA_CPATH

# Install the latest versions of nn, cutorch, cunn, cuDNN bindings and iTorch
RUN luarocks install nn && \
    luarocks install cutorch && \
    luarocks install cunn && \
    luarocks install loadcaffe && \
    \
    cd /root && git clone https://github.com/soumith/cudnn.torch.git && cd cudnn.torch && \
    git checkout R4 && \
    luarocks make && \
    \
    cd /root && git clone https://github.com/facebook/iTorch.git && \
    cd iTorch && \
    luarocks make

# Install OpenCV
RUN git clone --depth 1 https://github.com/opencv/opencv.git /root/opencv && \
    cd /root/opencv && \
    mkdir build && \
    cd build && \
    cmake -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs -DWITH_QT=ON -DWITH_OPENGL=ON -DFORCE_VTK=ON -DWITH_TBB=ON -DWITH_GDAL=ON -DWITH_XINE=ON -DBUILD_EXAMPLES=ON .. && \
    make -j"$(nproc)"  && \
    make install && \
    ldconfig && \
    echo 'ln /dev/null /dev/raw1394' >> ~/.bashrc

# Expose Ports for TensorBoard (6006), Ipython (8888), Flask service(8080)
EXPOSE 6006 8888 8080

WORKDIR "/root"
CMD ["/bin/bash"]
reppolice commented 7 years ago

Is this supposed to work in the repository I just cloned? In addition to the "build" instructions missing a parameter, I entered a dot "." for the current path, I could not get a GPU build, log attached error.txt

jtryan commented 7 years ago

@reppolice No Pull Request was made for this code change. The easy solution is to clone this repo: git@github.com:Paperspace/dl-docker.git

reppolice commented 7 years ago

@jtryan I cloned that one, but didn't have much success,I don't think it would be a problem on my side, would it?

... 2017-10-21 20:47:49 (1.84 MB/s) - '/root/downloads/cudnn-8.0-linux-x64-v6.0.tgz' saved [201134139/201134139]

cuda/include/cudnn.h cuda/lib64/libcudnn.so cuda/lib64/libcudnn.so.6 cuda/lib64/libcudnn.so.6.0.21 cuda/lib64/libcudnn_static.a ---> fb47c42ccca4 Removing intermediate container 871dccc622e8 Step 13/40 : ADD cuda/include/cudnn.h /usr/local/cuda-8.0/include ADD failed: stat /var/lib/docker/tmp/docker-builder494311789/cuda/include/cudnn.h: no such file or directory

pbamotra commented 7 years ago

Try this. I'm just a beginner with Docker so excuse my hack-y way of doing things.

Dockerfile.gpu.txt

jtryan commented 7 years ago

@reppolice No I got the same error. @pbamotra change to the Dockerfile.gpu adding this section should work fine. If you replace Dockerfile.gpu from this repo with his file above, anIt is a long build though... :smile:

adamjosephjensen commented 6 years ago

Hi, I'm running into the cuDNN6, libcudnn.so.6 issue trying to import tensorflow tf-nightly-gpu==1.5.0-dev20171127.

Would you be willing to provide me with instructions on how to resolve this? I'm not sure what exactly to do with the Dockerfile but I will try to figure it out in the meantime.

Thanks in advance. Aside from this (which is blocking training) using floydhub has been great.

EDIT

I partially resolved the issue with the following setup.sh script:

#!/bin/bash

echo 'PATH is:'
echo $PATH
echo 'LD_LIBRARY_PATH is:'
echo $LD_LIBRARY_PATH
echo "lib64:"
ls /usr/local/cuda/lib64/ | grep "libcudn*"
echo "include: "
ls /usr/local/cuda/include/ | grep "libcudn*"

CUDNN_TAR_FILE=cudnn-8.0-linux-x64-v6.0.tgz

wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE} -P /root/downloads && \
cd /root/downloads && \
tar -xzvf ${CUDNN_TAR_FILE} && \
cp cuda/include/cudnn.h /usr/local/cuda-8.0/include && \
cp cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/ && \
chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64

cd /usr/local/cuda/lib64 && \
rm libcudnn.so && \
rm libcudnn.so.6 && \
ln libcudnn.so.6.* libcudnn.so.6 && \
ln libcudnn.so.6 libcudnn.so && \
ldconfig

echo 'PATH is:'
echo $PATH
echo 'LD_LIBRARY_PATH is:'
echo $LD_LIBRARY_PATH
echo "lib64:"
ls /usr/local/cuda/lib64/ | grep "libcudn*"
echo "include: "
ls /usr/local/cuda/include/ | grep "libcudn*"

#this may not be necessary or useful
pip3 install cudnn-python-wrappers

pip3 install tf-nightly-gpu

However, that does mean I need to download the 65 MB Nvidia driver on every build. Obviously this is less than ideal. Would Floydhub be willing to fix this?