Closed BoneGoat closed 4 years ago
Hi @BoneGoat,
Related issue https://github.com/facebookresearch/wav2letter/issues/335.
W2L is Cuda10 compatible. You can try to use base image wav2letter/wav2letter:cuda-base-10-latest
and then install the flashilight and w2l inside it or use wav2letter/wav2letter:cuda-10-latest
, but you need to do git pull, cmake and make for flashilight and wav2letter.
Maybe the simplest thing is to use wav2letter/wav2letter:cuda-base-10-latest
and then follow instructions of installation from https://github.com/facebookresearch/wav2letter/blob/master/Dockerfile-CUDA
Also adding here the Dockerfiles which I used previously.
cuda-base-10 flashlight
# ==================================================================
# module list
# ------------------------------------------------------------------
# Ubuntu 16.04
# CUDA 9.2
# CuDNN 7-dev
# arrayfire 3.6.4 (git, CUDA backend)
# OpenMPI latest (apt)
# ==================================================================
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
rm -rf /var/lib/apt/lists/* \
/etc/apt/sources.list.d/cuda.list \
/etc/apt/sources.list.d/nvidia-ml.list && \
apt-get update && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
build-essential \
ca-certificates \
cmake \
wget \
git \
vim \
emacs \
nano \
htop \
g++ \
# ssh for OpenMPI
openssh-server openssh-client \
# OpenMPI
libopenmpi-dev libomp-dev \
# nccl: for flashlight
libnccl2 libnccl-dev \
libglfw3-dev && \
# ==================================================================
# arrayfire https://github.com/arrayfire/arrayfire/wiki/
# ------------------------------------------------------------------
cd /tmp && git clone --recursive https://github.com/arrayfire/arrayfire.git && \
cd arrayfire && git checkout v3.6.4 && git submodule update --init --recursive && \
mkdir build && cd build && \
CXXFLAGS=-DOS_LNX cmake .. -DCMAKE_BUILD_TYPE=Release -DAF_BUILD_CPU=OFF -DAF_BUILD_OPENCL=OFF -DAF_BUILD_EXAMPLES=OFF && \
make -j8 && \
make install && \
# ==================================================================
# config & cleanup
# ------------------------------------------------------------------
ldconfig && \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/* /tmp/* ~/* && \
# If the driver is not found (during docker build) the cuda driver api need to be linked against the
# libcuda.so stub located in the lib[64]/stubs directory
ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/lib/x86_64-linux-gnu/libcuda.so.1
cuda-base-10
# ==================================================================
# module list
# ------------------------------------------------------------------
# inherit from flml/flashlight:cuda-base-10-latest
# python 3.6 (apt)
# libsndfile bef2abc (git)
# MKL 2018.4.057 (apt)
# FFTW latest (apt)
# KenLM e47088d (git)
# GLOG latest (apt)
# gflags latest (apt)
# python 3.6 (apt)
# ==================================================================
FROM flml/flashlight:cuda-base-10-latest
RUN APT_INSTALL="apt-get install -y --no-install-recommends" && \
apt-get update && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
# for libsndfile
autoconf automake autogen build-essential libasound2-dev \
libflac-dev libogg-dev libtool libvorbis-dev pkg-config python \
# for Intel's Math Kernel Library (MKL)
cpio \
# FFTW
libfftw3-dev \
# for kenlm
zlib1g-dev libbz2-dev liblzma-dev libboost-all-dev \
# gflags
libgflags-dev libgflags2v5 \
# for glog
libgoogle-glog-dev libgoogle-glog0v5 \
# for receipts data processing
sox && \
# ==================================================================
# python (for receipts data processing)
# ------------------------------------------------------------------
PIP_INSTALL="python3 -m pip --no-cache-dir install --upgrade" && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
software-properties-common \
&& \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
python3.6 \
python3.6-dev \
&& \
wget -O ~/get-pip.py \
https://bootstrap.pypa.io/get-pip.py && \
python3.6 ~/get-pip.py && \
ln -s /usr/bin/python3.6 /usr/local/bin/python3 && \
ln -s /usr/bin/python3.6 /usr/local/bin/python && \
$PIP_INSTALL \
setuptools \
&& \
$PIP_INSTALL \
sox \
tqdm && \
# ==================================================================
# libsndfile https://github.com/erikd/libsndfile.git
# ------------------------------------------------------------------
cd /tmp && git clone https://github.com/erikd/libsndfile.git && \
cd libsndfile && git checkout bef2abc9e888142203953addc31c50a192e496e5 && \
./autogen.sh && ./configure --enable-werror && \
make && make check && make install && \
# ==================================================================
# MKL https://software.intel.com/en-us/mkl
# ------------------------------------------------------------------
cd /tmp && wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB && \
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB && \
wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list && \
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list' && \
apt-get update && DEBIAN_FRONTEND=noninteractive $APT_INSTALL intel-mkl-64bit-2018.4-057 && \
# ==================================================================
# KenLM https://github.com/kpu/kenlm
# ------------------------------------------------------------------
cd /root && git clone https://github.com/kpu/kenlm.git && \
cd kenlm && git checkout e47088ddfae810a5ee4c8a9923b5f8071bed1ae8 && \
mkdir build && cd build && \
cmake .. && \
make -j8 && make install && \
# ==================================================================
# config & cleanup
# ------------------------------------------------------------------
ldconfig && \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/* /tmp/*
cuda-10
# ==================================================================
# module list
# ------------------------------------------------------------------
# flashlight master (git, CUDA backend)
# ==================================================================
FROM wav2letter/wav2letter:cuda-base-10-latest
RUN mkdir /root/wav2letter
COPY . /root/wav2letter
# ==================================================================
# flashlight https://github.com/facebookresearch/flashlight.git
# ------------------------------------------------------------------
RUN cd /root && git clone --recursive https://github.com/facebookresearch/flashlight.git && \
cd /root/flashlight && mkdir -p build && \
cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DFLASHLIGHT_BACKEND=CUDA && \
make -j8 && make install && \
# ==================================================================
# wav2letter with GPU backend
# ------------------------------------------------------------------
export MKLROOT=/opt/intel/mkl && export KENLM_ROOT_DIR=/root/kenlm && \
cd /root/wav2letter && mkdir -p build && \
cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_CRITERION_BACKEND=CUDA && \
make -j8
@tlikhomanenko Thanks for the Docker files! I built my own Dockerfile which look very similar to yours. The only differences I can spot are 10.2-cudnn7-devel-ubuntu16.04 and that I'm building ArrayFire from master.
When building my Dockerfile all tests are OK except one for Flashlight:
[ RUN ] AutogradTest.Variance
/root/flashlight/tests/autograd/AutogradTest.cpp:485: Failure
Value of: allClose(calculated_var.array(), expected_var)
Actual: false
Expected: true
[ FAILED ] AutogradTest.Variance (334 ms)
I'm going to build your Dockerfile and see if that will work.
@BoneGoat,
If you are building arrayfire from the master be sure that their tests pass, maybe this was an issue in your original bug report.
For the flashligh - there could be some discrepancy in the precision, because kernels are different a bit. Does only AutogradTest.Variance
fail for flashlight?
@tlikhomanenko In my original issue I'm running the pre-built docker image. I don't understand why that wouldn't work as Cuda should be backwards compatible. To get things moving I wrote my own Dockerfile but then the AutogradTest.Variance fails and this is the only test that fail for Flashlight. All tests are OK for W2L.
I have now tested your Dockerfile and all tests are OK. So I will move forward with that one.
Thanks for you help!
I wanted to try out w2l with the tutorials / recipes, but I just encountered the same AutogradTest.Variance
issue. I don't know enough to judge what variance type is correct, but I think I tracked it down to ArrayFire fixing the specification of their isbiased
parameter here: https://github.com/arrayfire/arrayfire/pull/2710.
@kriswuollett — we don't directly use af::var
in flashlight, but we're changing the behavior of that test so things work. Thanks for flagging.
I have been able to run W2L before on this machine but now everything seems to go wrong. Any clue as to where I should start looking?
After some digging maybe W2L isn't Cuda 10 compatible? I've tried to build my own Docker image with FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu16.04 but the tests still SegFaults. Is there anything else I need to change in the Dockerfile?
Just noticed the following error when building ArrayFire: