hkunzhe commented 5 months ago

Since vLLM does not release official builds for torch2.2.0 with CUDA11.8, I build vllm0.3.3 from source.

mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:torch2.2.0_cu118_vllm_0.3.3

The above docker contains additional requirements for video caption.

hkunzhe commented 5 months ago

git clone https://github.com/vllm-project/vllm.git && cd vllm && git checkout v0.3.3
Modify torch/cuda version in pyproject.toml, requirements-build.txt, requirements.txt and setup.py.

Use the following dockerfile:


# The vLLM Dockerfile is used to construct vLLM image that can be directly used
# to run the OpenAI compatible server.

#################### BASE BUILD IMAGE #################### FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev

RUN apt-get update -y \ && apt-get install -y python3-pip git

2507 and

https://github.com/pytorch/pytorch/issues/107960 -- hopefully

this won't be needed for future versions of this docker image

or future versions of triton.

RUN ldconfig /usr/local/cuda-11.8/compat/

WORKDIR /workspace

install build and runtime dependencies

COPY requirements.txt requirements.txt RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118

install development dependencies

COPY requirements-dev.txt requirements-dev.txt RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements-dev.txt #################### BASE BUILD IMAGE ####################

#################### EXTENSION BUILD IMAGE #################### FROM dev AS build

install build dependencies

COPY requirements-build.txt requirements-build.txt RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cu118

copy input files

COPY csrc csrc COPY setup.py setup.py COPY requirements.txt requirements.txt COPY pyproject.toml pyproject.toml COPY vllm/init.py vllm/init.py

cuda arch list used by torch

ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX' ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}

max jobs used by Ninja to build extensions

ARG max_jobs=4 ENV MAX_JOBS=${max_jobs}

number of threads used by nvcc

ARG nvcc_threads=8 ENV NVCC_THREADS=$nvcc_threads

make sure punica kernels are built (for LoRA)

ENV VLLM_INSTALL_PUNICA_KERNELS=1

RUN PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cu118 python3 setup.py build_ext --inplace #################### EXTENSION Build IMAGE ####################

#################### TEST IMAGE ####################

image to run unit testing suite

FROM dev AS test

copy pytorch extensions separately to avoid having to rebuild

when python code changes

WORKDIR /vllm-workspace

ADD is used to preserve directory structure

ADD . /vllm-workspace/ COPY --from=build /workspace/vllm/*.so /vllm-workspace/vllm/

ignore build dependencies installation because we are using pre-complied extensions

RUN rm pyproject.toml RUN --mount=type=cache,target=/root/.cache/pip VLLM_USE_PRECOMPILED=1 pip install . --verbose --extra-index-url https://download.pytorch.org/whl/cu118 #################### TEST IMAGE ####################

RUN pip install auto-gptq==0.6.0 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ RUN pip install func_timeout decord sglang[srt]==0.1.13 pandas>=2.0.0

fix version

RUN pip install outlines==0.0.34

hkunzhe commented 4 months ago

@bubbliiiing The previous docker image was built and tested on A10. However, we are faced with RuntimeError: Triton Error [CUDA]: device kernel image is invalid on A100. After digging into https://github.com/triton-lang/triton/issues/1955, we should add the following line to the dockerfile.

ENV TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas TRITON_CUOBJDUMP_PATH=/usr/local/cuda/bin/cuobjdump TRITON_NVDISASM_PATH=/usr/local/cuda/bin/nvdisasm

aigc-apps / EasyAnimate

Fix video caption #3