Open nickmitchko opened 2 months ago
Hi, thanks for the proposal.
We can provide the CUDA 12 dockerfile as needed
However, I have some concerns about the inclusion of timm
and flash-attn
.
LMDeploy has its own implementation for flash-attn, making it unnecessary for our use case.
Additionally, while timm
is used by some vision-language models, it's not universally adopted, suggesting that we might want to consider alternatives or customizations tailored to the specific requirements
Hi @lvhan028, excluding timm
and flash-attn
are probably good for most use cases. It would be nice to have the following tags for the container:
openmmlab/lmdeploy:vX.Y.Z
openmmlab/lmdeploy:vX.Y.Z-cu12x
With InternVL-1.5
it was a struggle getting the container in the proper state. I will share my build files here for reference in a bit.
Sure. We can follow the tag rules you mentioned above.
FROM nvcr.io/nvidia/tritonserver:24.04-py3
RUN rm /etc/apt/sources.list.d/cuda*.list && apt-get update && apt-get install -y --no-install-recommends \
rapidjson-dev libgoogle-glog-dev gdb python3-venv \
&& rm -rf /var/lib/apt/lists/* && cd /opt && python3 -m venv py3
ENV PATH=/opt/py3/bin:$PATH
ARG CUDA_VERSION_MAJOR=12
ARG CUDA_VERSION_MINOR=4
ENV CUDA_VERSION_MAJOR=${CUDA_VERSION_MAJOR}
ENV CUDA_VERSION_MINOR=${CUDA_VERSION_MINOR}
RUN python3 -m pip install --no-cache-dir --upgrade pip &&\
python3 -m pip install --no-cache-dir torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121 &&\
python3 -m pip install --no-cache-dir cmake packaging wheel &&\
python3 -m pip install --no-cache-dir timm flash-attn
ENV NCCL_LAUNCH_MODE=GROUP
# Should be in the lmdeploy root directory when building docker image
COPY . /opt/lmdeploy
WORKDIR /opt/lmdeploy
RUN cd /opt/lmdeploy &&\
python3 -m pip install --no-cache-dir -r requirements.txt &&\
mkdir -p build && cd build &&\
cmake .. \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
-DCMAKE_INSTALL_PREFIX=/opt/tritonserver \
-DBUILD_PY_FFI=ON \
-DCUDA_VERSION_MAJOR=${CUDA_VERSION_MAJOR} \
-DCUDA_VERSION_MINOR=${CUDA_VERSION_MINOR} \
-DBUILD_MULTI_GPU=ON \
-DBUILD_CUTLASS_MOE=OFF \
-DBUILD_CUTLASS_MIXED_GEMM=OFF \
-DCMAKE_CUDA_FLAGS="-lineinfo" \
-DUSE_NVTX=ON &&\
make -j$(nproc) && make install &&\
cd .. &&\
python3 -m pip install -e . &&\
rm -rf build
ENV LD_LIBRARY_PATH=/opt/tritonserver/lib:$LD_LIBRARY_PATH
Currently to run a cuda-12 container image one must build their own and edit the dockerfile.
Hmm, I may be misunderstanding or naive here, but if Official image is made for 11.8, and the host has >= 11.8, then shouldn't the image work fine due to backwards compatibility? I've just tested the latest official image (openmmlab/lmdeploy:v0.4.2
) works fine for me on CUDA 12.0, 12.1, 12.2, 12.3, and 12.4 machines from Runpod, except for 4090 on 12.3 for some reason, which I just opened an issue about.
These numbers are the result of nvidia-smi
from inside the Official Docker image on various GPUs from Runpod:
✅ GPU: L40 Driver Version: 525.116.04 CUDA Version: 12.0
✅ GPU: 2x3090 Driver Version: 525.85.12 CUDA Version: 12.0
✅ GPU: 2x4090 Driver Version: 525.125.06 CUDA Version: 12.0
❌ GPU: 2x4090 Driver Version: 545.29.06 CUDA Version: 12.3
❌ GPU: 2x4090 Driver Version: 545.23.08 CUDA Version: 12.3
✅ GPU: A30 Driver Version: 545.23.08 CUDA Version: 12.3
✅ GPU: L40 Driver Version: 545.23.08 CUDA Version: 12.3
✅ GPU: A6000 Driver Version: 550.54.15 CUDA Version: 12.4
✅ GPU: 2x4090 Driver Version: 550.54.15 CUDA Version: 12.4
Motivation
Create container images for CUDA 12+ versions. Currently to run a cuda-12 container image one must build their own and edit the dockerfile.
Related resources
I propose adding a new dockerfile for specific cuda versions: docker/Dockerfile
Additional context
No response