Open AndrewMead10 opened 2 months ago
Considering whether upgrading to PyTorch >=2.4.0 is necessary at this time to avoid impacting users on cuDNN 8. However, it might be better to follow PyTorch and upgrade to cuDNN 9.
Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.
I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)
Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.
I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)
Hey, would you mind sharing your Dockerfile and any additional relevant commands you've used? I'm trying to switch over project faster-whisper-server over to latest CUDA with CUDNN9. Thanks!
#FROM nvidia/cuda:12.1.0-devel-ubuntu20.04 as builder
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3-dev \
python3-pip \
wget \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
WORKDIR /root
ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
apt-key add *.PUB && \
rm *.PUB && \
echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
apt-get update && \
apt-get install -y --no-install-recommends \
intel-oneapi-mkl-devel-$ONEAPI_VERSION \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN python3 -m pip --no-cache-dir install cmake==3.22.*
ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
tar xf *.tar.gz && \
rm *.tar.gz && \
cd oneDNN-* && \
cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
make -j$(nproc) install && \
cd .. && \
rm -r oneDNN-*
ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
tar xf *.tar.bz2 && \
rm *.tar.bz2 && \
cd openmpi-* && \
./configure && \
make -j$(nproc) install && \
cd .. && \
rm -r openmpi-*
RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12
COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .
ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}
RUN mkdir build_tmp && \
cd build_tmp && \
cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
-DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
-DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
VERBOSE=1 make -j$(nproc) install
ENV LANG=en_US.UTF-8
COPY README.md .
RUN cd python && \
python3 -m pip --no-cache-dir install -r install_requirements.txt && \
python3 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib
#COPY --from=builder $CTRANSLATE2_ROOT $CTRANSLATE2_ROOT
RUN python3 -m pip --no-cache-dir install $CTRANSLATE2_ROOT/*.whl
#&& \
# rm $CTRANSLATE2_ROOT/*.whl
ENTRYPOINT ["/opt/ctranslate2/bin/ct2-translator"]
Build it with docker build --progress plain -f Dockerfile ..
If you have problems, I've pushed the image to the docker container registry drake7707/ctranslate2-cudnn9
. You can copy /opt/ctranslate2 out of it and to your own, Don't forget to add it to the LD_LIBRARY_PATH and install the built wheel (also in /opt/ctranslate2). I didn't have to change anything else to get faster-whisper to work.
The dockerfile is mostly the same, I got a circular dependency with the multistage build for some reason and I couldn't spot the issue quickly so I did away with that. I had to install the cudnn9 dev kit with RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12
and made sure to not remove the wheel file so I could still copy it out. At the runtime side I had an issue where my libstdcxx was outdated in my runtime container anaconda environment, it's linked against against a newer version here, but a conda install -c conda-forge libstdcxx-ng=12 --yes
fixed that.
Regarding faster-whisper, I was able to reproduce the same bug on torch => 2.4.0
.
According to https://github.com/pytorch/pytorch/issues/100974, torch uses its own dependent cudNN
and torch >= 2.4.0
is therefore incompatible with CTranslate2.
It would be great if CTranslate2 supported cuDNN 9 so I could use it on torch >= 2.4.0
.
@drake7707 's great Dockerfile worked for me, except that:
python 3.10
, whereas the base image has python 3.11
. I explicitly installed python3.10
with apt-get
and then used that for all python
commands after the cmake
.pip
in a venv
and have it "just work." To do that, I needed the shared binaries in the wheel, so I used (auditwheel
) to "repair" the wheel.After those changes (at the bottom), I was able to get the wheel out of the docker container and install it as a dependency with pip
:
(.venv) $ docker build . -t drake7707/ctranslate2-cudnn9:python3.10
(.venv) $ docker run -it --rm -v ./outputdir:/opt/share/outputdir drake7707/ctranslate2-cudnn9:python3.10
root # cp /opt/ctranslate2/ctranslate2-*.whl /opt/share/outputdir/
root # exit
(.venv) $ pip install outputdir/ctranslate2-*.whl
Here is my Dockerfile, mostly what @drake7707 originally wrote:
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3.10-dev \
python3-dev \
python3-pip \
wget \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
WORKDIR /root
ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
apt-key add *.PUB && \
rm *.PUB && \
echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
apt-get update && \
apt-get install -y --no-install-recommends \
intel-oneapi-mkl-devel-$ONEAPI_VERSION \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN python3 -m pip --no-cache-dir install cmake==3.22.*
ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
tar xf *.tar.gz && \
rm *.tar.gz && \
cd oneDNN-* && \
cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
make -j$(nproc) install && \
cd .. && \
rm -r oneDNN-*
ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
tar xf *.tar.bz2 && \
rm *.tar.bz2 && \
cd openmpi-* && \
./configure && \
make -j$(nproc) install && \
cd .. && \
rm -r openmpi-*
RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12
COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .
ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}
RUN mkdir build_tmp && \
cd build_tmp && \
cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
-DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
-DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
VERBOSE=1 make -j$(nproc) install
ENV LANG=en_US.UTF-8
COPY README.md .
RUN cd python && \
python3.10 -m pip --no-cache-dir install -r install_requirements.txt && \
python3.10 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib
RUN python3.10 -m pip --no-cache-dir install auditwheel && \
auditwheel repair --plat linux_x86_64 $CTRANSLATE2_ROOT/*.whl && \
cp /root/wheelhouse/ctranslate2-*.whl ${CTRANSLATE2_ROOT}/
CMD ["bash"]
@minhthuc2502 Could you please let us know if there is an update on this and the timeline estimate?
Hello, I will update cudnn 9 for the next release. currently you can build ctranslate2 with cudnn 9 without any problem.
When will the next release be? I don't see any proposed pull requests regarding cudnn 9+ support yet. A lot of libraries are now requiring it...
currently trying to use whisperX (which uses faster whisper, which uses CTranslate2), and am getting the following error
Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!
which from what I can tell is due to having CUDNN version > 9. This is an issue because it looks like pytorch >= 2.4.0 are compiled with CUDNN > 9.
See discussion here
https://github.com/SYSTRAN/faster-whisper/pull/958
Workaround right now is just to use a pytorch version < 2.4