Open jiabao-wang opened 5 days ago
@jiabao-wang Hi, are you building docker image on x86_64 platform? Currently, the Dockerfile is only supported on aarch64. For x86_64 platform, the pypi package of dlinfer is not uploaded, as well as the problem you mentioned above. There's a workaround for this case which is not released yet.
@jiabao-wang Here is a new Dockerfile for ascend x86_64 platform, it's only tested for building on x86_64 machine, the inference of models is not tested yet since we don't have any x86_64 ascend npu machine.
FROM ubuntu:20.04 as base_builder
WORKDIR /tmp
ARG http_proxy
ARG https_proxy
ARG DEBIAN_FRONTEND=noninteractive
RUN sed -i 's@http://.*.ubuntu.com@http://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \
apt update && \
apt install --no-install-recommends ca-certificates -y && \
apt install --no-install-recommends bc wget -y && \
apt install --no-install-recommends git curl gcc make g++ pkg-config unzip -y && \
apt install --no-install-recommends libsqlite3-dev libblas3 liblapack3 gfortran vim -y && \
apt install --no-install-recommends liblapack-dev libblas-dev libhdf5-dev libffi-dev -y && \
apt install --no-install-recommends libssl-dev zlib1g-dev xz-utils cython3 python3-h5py -y && \
apt install --no-install-recommends libopenblas-dev libgmpxx4ldbl liblzma-dev -y && \
apt install --no-install-recommends libicu66 libxml2 pciutils libgl1-mesa-glx libbz2-dev -y && \
apt install --no-install-recommends libreadline-dev libncurses5 libncurses5-dev libncursesw5 -y && \
sed -i 's@http://mirrors.tuna.tsinghua.edu.cn@https://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \
apt clean && rm -rf /var/lib/apt/lists/*
ARG PYVERSION=3.10.5
ENV LD_LIBRARY_PATH=/usr/local/python${PYVERSION}/lib: \
PATH=/usr/local/python${PYVERSION}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
RUN umask 0022 && \
wget https://repo.huaweicloud.com/python/${PYVERSION}/Python-${PYVERSION}.tar.xz && \
tar -xf Python-${PYVERSION}.tar.xz && cd Python-${PYVERSION} && ./configure --prefix=/usr/local/python${PYVERSION} --enable-shared && \
make -j 16 && make install && \
ln -sf /usr/local/python${PYVERSION}/bin/python3 /usr/bin/python3 && \
ln -sf /usr/local/python${PYVERSION}/bin/python3 /usr/bin/python && \
ln -sf /usr/local/python${PYVERSION}/bin/pip3 /usr/bin/pip3 && \
ln -sf /usr/local/python${PYVERSION}/bin/pip3 /usr/bin/pip && \
cd .. && \
rm -rf Python*
RUN --mount=type=cache,target=/root/.cache/pip pip3 config set global.index-url http://mirrors.aliyun.com/pypi/simple && \
pip3 config set global.trusted-host mirrors.aliyun.com && \
pip3 install -U pip && \
pip3 install wheel==0.43.0 scikit-build==0.18.0 numpy==1.24 setuptools==69.5.1 && \
pip3 install decorator sympy cffi && \
pip3 install cmake ninja pyyaml && \
pip3 install pathlib2 protobuf attrs attr scipy && \
pip3 install requests psutil absl-py
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/hdf5/serial:$LD_LIBRARY_PATH
FROM ubuntu:20.04 as build_temp
COPY . /tmp
FROM base_builder as cann_builder
ARG ASCEND_BASE=/usr/local/Ascend
ARG TOOLKIT_PATH=$ASCEND_BASE/ascend-toolkit/latest
ENV LD_LIBRARY_PATH=\
$ASCEND_BASE/driver/lib64:\
$ASCEND_BASE/driver/lib64/common:\
$ASCEND_BASE/driver/lib64/driver:\
$ASCEND_BASE/driver/tools/hccn_tool/:\
$TOOLKIT_PATH/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/x86_64/:\
$LD_LIBRARY_PATH
# run files should be placed at the root dir of repo
ARG CHIP=all
ARG TOOLKIT_PKG=Ascend-cann-toolkit_*.run
ARG KERNELS_PKG=Ascend-cann-kernels-*.run
ARG NNAL_PKG=Ascend-cann-nnal_*.run
RUN --mount=type=cache,target=/tmp,from=build_temp,source=/tmp \
umask 0022 && \
mkdir -p $ASCEND_BASE/driver && \
if [ "$CHIP" != "all" ]; \
then \
CHIPOPTION="--chip=$CHIP"; \
else \
CHIPOPTION=""; \
fi && \
chmod +x $TOOLKIT_PKG $KERNELS_PKG $NNAL_PKG && \
./$TOOLKIT_PKG --quiet --install --install-path=$ASCEND_BASE --install-for-all $CHIPOPTION && \
./$KERNELS_PKG --quiet --install --install-path=$ASCEND_BASE --install-for-all && \
. /usr/local/Ascend/ascend-toolkit/set_env.sh && \
./$NNAL_PKG --quiet --install --install-path=$ASCEND_BASE && \
rm -f $TOOLKIT_PKG $KERNELS_PKG $NNAL_PKG
ENV GLOG_v=2 \
LD_LIBRARY_PATH=$TOOLKIT_PATH/lib64:$LD_LIBRARY_PATH \
TBE_IMPL_PATH=$TOOLKIT_PATH/opp/op_impl/built-in/ai_core/tbe \
PATH=$TOOLKIT_PATH/ccec_compiler/bin:$PATH \
ASCEND_OPP_PATH=$TOOLKIT_PATH/opp \
ASCEND_AICPU_PATH=$TOOLKIT_PATH
ENV PYTHONPATH=$TBE_IMPL_PATH:$PYTHONPATH
SHELL ["/bin/bash", "-c"]
RUN echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc && \
echo "source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0" >> ~/.bashrc && \
. ~/.bashrc
# dlinfer
# timm is required for internvl2 model
WORKDIR /opt/
RUN --mount=type=cache,target=/root/.cache/pip \
pip3 install torch==2.3.1+cpu torchvision==0.18.1+cpu --index-url=https://download.pytorch.org/whl/cpu && \
pip3 install torch-npu==2.3.1 && \
pip3 install transformers timm && \
git clone https://github.com/DeepLink-org/dlinfer.git && \
cd dlinfer && DEVICE=ascend python setup.py develop
# lmdeploy
FROM build_temp as copy_temp
RUN rm -rf /tmp/*.run
FROM cann_builder as final_builder
COPY --from=copy_temp /tmp /opt/lmdeploy
WORKDIR /opt/lmdeploy
RUN --mount=type=cache,target=/root/.cache/pip \
sed -i '/triton/d' requirements/runtime.txt && \
pip3 install -v --no-build-isolation -e .
@CyCle1024 I have make the docker iamge following the Dockerfile for ascend x86_64 but when i try to run: docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
output error:
(base) wjb@ubuntu-Atlas-800-Model-3010:~$ docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
Traceback (most recent call last):
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3.10.5/bin/lmdeploy", line 33, in
docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
Is this a typo? (lmdeploy-aarch64-ascend:latest )
I have a check_env result from x86 machine, please check it.
@CyCle1024 I have successed run "docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env" after I change "pip3 install transformers" to "pip3 install transformers==4.42.3" in docker file.
but when i test code: from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig if name == "main": pipe = pipeline("internlm/internlm2_5-7b-chat", backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) question = ["Shanghai is", "Please introduce China", "How are you?"] response = pipe(question) print(response)
output : pipe = pipeline("internlm/internlm2_5-7b-chat",backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'
my env:
when i run: from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image if name == "main": pipe = pipeline('OpenGVLab/InternVL2-2B', backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') response = pipe(('describe this image', image)) print(response)
also output:
Traceback (most recent call last):
File "/opt/mycode/vl.py", line 4, in
@CyCle1024 I have successed run "docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env" after I change "pip3 install transformers" to "pip3 install transformers==4.42.3" in docker file.
but when i test code: from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig if name == "main": pipe = pipeline("internlm/internlm2_5-7b-chat", backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) question = ["Shanghai is", "Please introduce China", "How are you?"] response = pipe(question) print(response)
output : pipe = pipeline("internlm/internlm2_5-7b-chat",backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'
my env:
when i run: from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image if name == "main": pipe = pipeline('OpenGVLab/InternVL2-2B', backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') response = pipe(('describe this image', image)) print(response)
also output: Traceback (most recent call last): File "/opt/mycode/vl.py", line 4, in pipe = pipeline('OpenGVLab/InternVL2-2B',backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'
@jiabao-wang https://github.com/InternLM/lmdeploy/blob/0c80baa001e79d0b7d182b8a670190801d2d8d5b/lmdeploy/messages.py#L222-L261 refer to line 249 here, device_type is valid. What is the commit of lmdeploy that you used?
@CyCle1024 I use lmdeploy 0.4.0 /opt/lmdeploy
but how to use on 华为昇腾 ascend? I follow this article:https://lmdeploy.readthedocs.io/zh-cn/latest/get_started/ascend/get_started.html
@CyCle1024 I use lmdeploy 0.4.0 /opt/lmdeploy
but how to use on 华为昇腾 ascend? I follow this article:https://lmdeploy.readthedocs.io/zh-cn/latest/get_started/ascend/get_started.html
@jiabao-wang It does not support ascend, it's weird you following the latest docs of lmdeploy to start with ascend, but using old version. To enable ascend suppport, using the latest release of lmdeploy, such as https://github.com/InternLM/lmdeploy/tree/v0.6.3
Checklist
Describe the bug
when i run : DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \ -f docker/Dockerfile_aarch64_ascend .
ERROR: Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies. 341.8 341.8 The conflict is caused by: 341.8 The user requested torch==2.3.1 341.8 torchvision 0.18.1 depends on torch==2.3.1 341.8 torch-npu 2.3.1 depends on torch==2.3.1+cpu 341.8 341.8 To fix this you could try to: 341.8 1. loosen the range of package versions you've specified 341.8 2. remove package versions to allow pip to attempt to solve the dependency conflict 341.8 341.8 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
Dockerfile_aarch64_ascend:110
109 | # timm is required for internvl2 model 110 | >>> RUN --mount=type=cache,target=/root/.cache/pip \ 111 | >>> pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && \ 112 | >>> pip3 install transformers timm && \ 113 | >>> pip3 install dlinfer-ascend 114 |
ERROR: failed to solve: process "/bin/bash -c pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && pip3 install transformers timm && pip3 install dlinfer-ascend" did not complete successfully: exit code: 1
Reproduction
DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \ -f docker/Dockerfile_aarch64_ascend .
Environment
Error traceback