Closed hkunzhe closed 4 months ago
git clone https://github.com/vllm-project/vllm.git && cd vllm && git checkout v0.3.3
pyproject.toml
, requirements-build.txt
, requirements.txt
and setup.py
.
# The vLLM Dockerfile is used to construct vLLM image that can be directly used
# to run the OpenAI compatible server.
#################### BASE BUILD IMAGE #################### FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev
RUN apt-get update -y \ && apt-get install -y python3-pip git
RUN ldconfig /usr/local/cuda-11.8/compat/
WORKDIR /workspace
COPY requirements.txt requirements.txt RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
COPY requirements-dev.txt requirements-dev.txt RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements-dev.txt #################### BASE BUILD IMAGE ####################
#################### EXTENSION BUILD IMAGE #################### FROM dev AS build
COPY requirements-build.txt requirements-build.txt RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cu118
COPY csrc csrc COPY setup.py setup.py COPY requirements.txt requirements.txt COPY pyproject.toml pyproject.toml COPY vllm/init.py vllm/init.py
ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX' ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
ARG max_jobs=4 ENV MAX_JOBS=${max_jobs}
ARG nvcc_threads=8 ENV NVCC_THREADS=$nvcc_threads
ENV VLLM_INSTALL_PUNICA_KERNELS=1
RUN PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cu118 python3 setup.py build_ext --inplace #################### EXTENSION Build IMAGE ####################
#################### TEST IMAGE ####################
FROM dev AS test
WORKDIR /vllm-workspace
ADD . /vllm-workspace/ COPY --from=build /workspace/vllm/*.so /vllm-workspace/vllm/
RUN rm pyproject.toml RUN --mount=type=cache,target=/root/.cache/pip VLLM_USE_PRECOMPILED=1 pip install . --verbose --extra-index-url https://download.pytorch.org/whl/cu118 #################### TEST IMAGE ####################
RUN pip install auto-gptq==0.6.0 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ RUN pip install func_timeout decord sglang[srt]==0.1.13 pandas>=2.0.0
RUN pip install outlines==0.0.34
@bubbliiiing The previous docker image was built and tested on A10. However, we are faced with RuntimeError: Triton Error [CUDA]: device kernel image is invalid
on A100. After digging into https://github.com/triton-lang/triton/issues/1955, we should add the following line to the dockerfile.
ENV TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas TRITON_CUOBJDUMP_PATH=/usr/local/cuda/bin/cuobjdump TRITON_NVDISASM_PATH=/usr/local/cuda/bin/nvdisasm
Since vLLM does not release official builds for torch2.2.0 with CUDA11.8, I build vllm0.3.3 from source.
The above docker contains additional requirements for video caption.