OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.43k stars 306 forks source link

CUDNN 9 support #1780

Open AndrewMead10 opened 2 months ago

AndrewMead10 commented 2 months ago

currently trying to use whisperX (which uses faster whisper, which uses CTranslate2), and am getting the following error

Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!

which from what I can tell is due to having CUDNN version > 9. This is an issue because it looks like pytorch >= 2.4.0 are compiled with CUDNN > 9.

See discussion here

https://github.com/SYSTRAN/faster-whisper/pull/958

Workaround right now is just to use a pytorch version < 2.4

minhthuc2502 commented 2 months ago

Considering whether upgrading to PyTorch >=2.4.0 is necessary at this time to avoid impacting users on cuDNN 8. However, it might be better to follow PyTorch and upgrade to cuDNN 9.

drake7707 commented 2 months ago

Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.

I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)

fedirz commented 2 months ago

Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.

I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)

Hey, would you mind sharing your Dockerfile and any additional relevant commands you've used? I'm trying to switch over project faster-whisper-server over to latest CUDA with CUDNN9. Thanks!

drake7707 commented 2 months ago
#FROM nvidia/cuda:12.1.0-devel-ubuntu20.04 as builder
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        python3-dev \
        python3-pip \
        wget \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
    apt-key add *.PUB && \
    rm *.PUB && \
    echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
        intel-oneapi-mkl-devel-$ONEAPI_VERSION \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip --no-cache-dir install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
    tar xf *.tar.gz && \
    rm *.tar.gz && \
    cd oneDNN-* && \
    cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r oneDNN-*

ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
    tar xf *.tar.bz2 && \
    rm *.tar.bz2 && \
    cd openmpi-* && \
    ./configure && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r openmpi-*

RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12

COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}

RUN mkdir build_tmp && \
    cd build_tmp && \
    cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
          -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
          -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
          -DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
    VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN cd python && \
    python3 -m pip --no-cache-dir install -r install_requirements.txt && \
    python3 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT

ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

#COPY --from=builder $CTRANSLATE2_ROOT $CTRANSLATE2_ROOT
RUN python3 -m pip --no-cache-dir install $CTRANSLATE2_ROOT/*.whl
#&& \
#    rm $CTRANSLATE2_ROOT/*.whl

ENTRYPOINT ["/opt/ctranslate2/bin/ct2-translator"]

Build it with docker build --progress plain -f Dockerfile ..

If you have problems, I've pushed the image to the docker container registry drake7707/ctranslate2-cudnn9. You can copy /opt/ctranslate2 out of it and to your own, Don't forget to add it to the LD_LIBRARY_PATH and install the built wheel (also in /opt/ctranslate2). I didn't have to change anything else to get faster-whisper to work.

The dockerfile is mostly the same, I got a circular dependency with the multistage build for some reason and I couldn't spot the issue quickly so I did away with that. I had to install the cudnn9 dev kit with RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12 and made sure to not remove the wheel file so I could still copy it out. At the runtime side I had an issue where my libstdcxx was outdated in my runtime container anaconda environment, it's linked against against a newer version here, but a conda install -c conda-forge libstdcxx-ng=12 --yes fixed that.

jhj0517 commented 1 month ago

Regarding faster-whisper, I was able to reproduce the same bug on torch => 2.4.0.

According to https://github.com/pytorch/pytorch/issues/100974, torch uses its own dependent cudNN and torch >= 2.4.0 is therefore incompatible with CTranslate2.

It would be great if CTranslate2 supported cuDNN 9 so I could use it on torch >= 2.4.0.

kittsil commented 1 month ago

@drake7707 's great Dockerfile worked for me, except that:

  1. I am running python 3.10, whereas the base image has python 3.11. I explicitly installed python3.10 with apt-get and then used that for all python commands after the cmake.
  2. I wanted to be able to install the wheel with pip in a venv and have it "just work." To do that, I needed the shared binaries in the wheel, so I used (auditwheel) to "repair" the wheel.

After those changes (at the bottom), I was able to get the wheel out of the docker container and install it as a dependency with pip:

(.venv) $ docker build . -t drake7707/ctranslate2-cudnn9:python3.10
(.venv) $ docker run -it --rm -v ./outputdir:/opt/share/outputdir drake7707/ctranslate2-cudnn9:python3.10
root # cp /opt/ctranslate2/ctranslate2-*.whl /opt/share/outputdir/
root # exit
(.venv) $ pip install outputdir/ctranslate2-*.whl

Here is my Dockerfile, mostly what @drake7707 originally wrote:

FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    python3.10-dev \
    python3-dev \
    python3-pip \
    wget \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
    apt-key add *.PUB && \
    rm *.PUB && \
    echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
    intel-oneapi-mkl-devel-$ONEAPI_VERSION \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip --no-cache-dir install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
    tar xf *.tar.gz && \
    rm *.tar.gz && \
    cd oneDNN-* && \
    cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r oneDNN-*

ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
    tar xf *.tar.bz2 && \
    rm *.tar.bz2 && \
    cd openmpi-* && \
    ./configure && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r openmpi-*

RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12

COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}

RUN mkdir build_tmp && \
    cd build_tmp && \
    cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
    -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
    -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
    -DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
    VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN cd python && \
    python3.10 -m pip --no-cache-dir install -r install_requirements.txt && \
    python3.10 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT

ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

RUN python3.10 -m pip --no-cache-dir install auditwheel && \
    auditwheel repair --plat linux_x86_64 $CTRANSLATE2_ROOT/*.whl && \
    cp /root/wheelhouse/ctranslate2-*.whl ${CTRANSLATE2_ROOT}/

CMD ["bash"]
Jiltseb commented 1 month ago

@minhthuc2502 Could you please let us know if there is an update on this and the timeline estimate?

minhthuc2502 commented 1 month ago

Hello, I will update cudnn 9 for the next release. currently you can build ctranslate2 with cudnn 9 without any problem.

BBC-Esq commented 1 month ago

When will the next release be? I don't see any proposed pull requests regarding cudnn 9+ support yet. A lot of libraries are now requiring it...

MahmoudAshraf97 commented 1 month ago

1803