[Bug] Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies.

jiabao-wang commented 5 days ago

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

when i run : DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \ -f docker/Dockerfile_aarch64_ascend .

ERROR: Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies. 341.8 341.8 The conflict is caused by: 341.8 The user requested torch==2.3.1 341.8 torchvision 0.18.1 depends on torch==2.3.1 341.8 torch-npu 2.3.1 depends on torch==2.3.1+cpu 341.8 341.8 To fix this you could try to: 341.8 1. loosen the range of package versions you've specified 341.8 2. remove package versions to allow pip to attempt to solve the dependency conflict 341.8 341.8 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Dockerfile_aarch64_ascend:110

109 | # timm is required for internvl2 model 110 | >>> RUN --mount=type=cache,target=/root/.cache/pip \ 111 | >>> pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && \ 112 | >>> pip3 install transformers timm && \ 113 | >>> pip3 install dlinfer-ascend 114 |

ERROR: failed to solve: process "/bin/bash -c pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && pip3 install transformers timm && pip3 install dlinfer-ascend" did not complete successfully: exit code: 1

Reproduction

DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \ -f docker/Dockerfile_aarch64_ascend .

Environment

Atlas-800-Model-3010
Ascend Docker Runtime has already been installed.

Error traceback

DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest     -f docker/Dockerfile_aarch64_ascend .
[+] Building 1038.2s (15/18)                                                                                                                                                                     docker:default
 => [internal] load build definition from Dockerfile_aarch64_ascend                                                                                                                                        0.0s
 => => transferring dockerfile: 5.15kB                                                                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                                                                                                                                            3.3s
 => [build_temp 1/2] FROM docker.io/library/ubuntu:20.04@sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b                                                                          11.5s
 => => resolve docker.io/library/ubuntu:20.04@sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b                                                                                      0.0s
 => => sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b 6.69kB / 6.69kB                                                                                                             0.0s
 => => sha256:e5a6aeef391a8a9bdaee3de6b28f393837c479d8217324a2340b64e45a81e0ef 424B / 424B                                                                                                                 0.0s
 => => sha256:6013ae1a63c2ee58a8949f03c6366a3ef6a2f386a7db27d86de2de965e9f450b 2.30kB / 2.30kB                                                                                                             0.0s
 => => sha256:d9802f032d6798e2086607424bfe88cb8ec1d6f116e11cd99592dcaf261e9cd2 27.51MB / 27.51MB                                                                                                           9.8s
 => => extracting sha256:d9802f032d6798e2086607424bfe88cb8ec1d6f116e11cd99592dcaf261e9cd2                                                                                                                  1.4s
 => [internal] load build context                                                                                                                                                                         27.3s
 => => transferring context: 3.56GB                                                                                                                                                                       27.2s
 => [base_builder 2/6] WORKDIR /tmp                                                                                                                                                                        1.6s
 => [base_builder 3/6] RUN sed -i 's@http://.*.ubuntu.com@http://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list &&     apt update &&     apt install --no-install-recommends ca-certificates -y &  84.3s
 => [build_temp 2/2] COPY . /tmp                                                                                                                                                                          15.3s
 => [copy_temp 1/1] RUN rm -rf /tmp/*.run                                                                                                                                                                  0.3s
 => [base_builder 4/6] RUN umask 0022  &&     wget https://repo.huaweicloud.com/python/3.10.5/Python-3.10.5.tar.xz &&     tar -xf Python-3.10.5.tar.xz && cd Python-3.10.5 && ./configure --prefix=/usr/  99.0s
 => [base_builder 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip3 config set global.index-url http://mirrors.aliyun.com/pypi/simple &&     pip3 config set global.trusted-host mirrors.aliyun.c  53.2s
 => [base_builder 6/6] RUN if [ ! -d "/lib64" ];     then         mkdir /lib64 && ln -sf /lib/ld-linux-aarch64.so.1 /lib64/ld-linux-aarch64.so.1;     fi                                                   0.5s
 => [cann_builder 1/3] RUN --mount=type=cache,target=/tmp,from=build_temp,source=/tmp     umask 0022 &&     mkdir -p /usr/local/Ascend/driver &&     if [ "all" != "all" ];     then         CHIPOPTION  441.9s
 => [cann_builder 2/3] RUN echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc &&     echo "source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0" >> ~/.bashrc &&     . ~/.bashrc   0.4s
 => ERROR [cann_builder 3/3] RUN --mount=type=cache,target=/root/.cache/pip     pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&     pip3 install transformers timm &&     pip3 instal  342.3s
------
 > [cann_builder 3/3] RUN --mount=type=cache,target=/root/.cache/pip     pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&     pip3 install transformers timm &&     pip3 install dlinfer-ascend:
0.306 ERROR: ld.so: object '/lib/aarch64-linux-gnu/libGLdispatch.so.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
0.309 ERROR: ld.so: object '/lib/aarch64-linux-gnu/libGLdispatch.so.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
0.830 Looking in indexes: http://mirrors.aliyun.com/pypi/simple
1.137 Collecting torch==2.3.1
1.339   Downloading http://mirrors.aliyun.com/pypi/packages/cb/e2/1bd899d3eb60c6495cf5d0d2885edacac08bde7a1407eadeb2ab36eca3c7/torch-2.3.1-cp310-cp310-manylinux1_x86_64.whl (779.1 MB)
107.3      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 779.1/779.1 MB 10.1 MB/s eta 0:00:00
110.0 Collecting torchvision==0.18.1
110.0   Downloading http://mirrors.aliyun.com/pypi/packages/08/04/17425bf3c0620465ee182cea5c674db4debab87ed0627145d38039cb2a9e/torchvision-0.18.1-cp310-cp310-manylinux1_x86_64.whl (7.0 MB)
110.7      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 10.3 MB/s eta 0:00:00
110.9 Collecting torch-npu==2.3.1
111.1   Downloading http://mirrors.aliyun.com/pypi/packages/a6/e1/60664898a464930397632eb718a4330dd9b394d543394fd07d7b837abef4/torch_npu-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
112.2      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.7/11.7 MB 10.8 MB/s eta 0:00:00
112.4 Collecting filelock (from torch==2.3.1)
112.5   Downloading http://mirrors.aliyun.com/pypi/packages/b9/f8/feced7779d755758a52d1f6635d990b8d98dc0a29fa568bbe0625f18fdf3/filelock-3.16.1-py3-none-any.whl (16 kB)
112.5 Collecting typing-extensions>=4.8.0 (from torch==2.3.1)
112.6   Downloading http://mirrors.aliyun.com/pypi/packages/26/9f/ad63fc0248c5379346306f8668cda6e2e2e9c95e01216d2b8ffd9ff037d0/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
112.6 Requirement already satisfied: sympy in /usr/local/python3.10.5/lib/python3.10/site-packages (from torch==2.3.1) (1.13.3)
112.7 Collecting networkx (from torch==2.3.1)
112.7   Downloading http://mirrors.aliyun.com/pypi/packages/b9/54/dd730b32ea14ea797530a4479b2ed46a6fb250f682a9cfb997e968bf0261/networkx-3.4.2-py3-none-any.whl (1.7 MB)
112.8      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 11.3 MB/s eta 0:00:00
112.9 Collecting jinja2 (from torch==2.3.1)
113.0   Downloading http://mirrors.aliyun.com/pypi/packages/31/80/3a54838c3fb461f6fec263ebf3a3a41771bd05190238de3486aae8540c36/jinja2-3.1.4-py3-none-any.whl (133 kB)
113.1 Collecting fsspec (from torch==2.3.1)
113.1   Downloading http://mirrors.aliyun.com/pypi/packages/c6/b2/454d6e7f0158951d8a78c2e1eb4f69ae81beb8dca5fee9809c6c99e9d0d0/fsspec-2024.10.0-py3-none-any.whl (179 kB)
113.3 Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.3.1)
113.3   Downloading http://mirrors.aliyun.com/pypi/packages/b6/9f/c64c03f49d6fbc56196664d05dba14e3a561038a81a638eeb47f4d4cfd48/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
115.5      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 10.7 MB/s eta 0:00:00
115.7 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.3.1)
115.7   Downloading http://mirrors.aliyun.com/pypi/packages/eb/d5/c68b1d2cdfcc59e72e8a5949a37ddb22ae6cade80cd4a57a84d4c8b55472/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
115.8      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 11.9 MB/s eta 0:00:00
115.8 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.3.1)
115.8   Downloading http://mirrors.aliyun.com/pypi/packages/7e/00/6b218edd739ecfc60524e585ba8e6b00554dd908de2c9c66c1af3e44e18d/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
117.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 11.1 MB/s eta 0:00:00
117.2 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.3.1)
117.6   Downloading http://mirrors.aliyun.com/pypi/packages/ff/74/a2e2be7fb83aaedec84f391f082cf765dfb635e7caa9b49065f73e4835d8/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
193.2      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 6.6 MB/s eta 0:00:00
195.4 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.3.1)
195.5   Downloading http://mirrors.aliyun.com/pypi/packages/37/6d/121efd7382d5b0284239f4ab1fc1590d86d34ed4a4a2fdb13b30ca8e5740/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
240.8      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 8.2 MB/s eta 0:00:00
242.1 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.3.1)
242.1   Downloading http://mirrors.aliyun.com/pypi/packages/86/94/eb540db023ce1d162e7bea9f8f5aa781d57c65aed513c33ee9a5123ead4d/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
254.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 10.2 MB/s eta 0:00:00
254.5 Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.3.1)
254.6   Downloading http://mirrors.aliyun.com/pypi/packages/44/31/4890b1c9abc496303412947fc7dcea3d14861720642b49e8ceed89636705/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
260.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 10.2 MB/s eta 0:00:00
260.3 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.3.1)
260.4   Downloading http://mirrors.aliyun.com/pypi/packages/bc/1d/8de1e5c67099015c834315e333911273a8c6aaba78923dd1d1e25fc5f217/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
272.5      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 10.2 MB/s eta 0:00:00
273.0 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.3.1)
273.0   Downloading http://mirrors.aliyun.com/pypi/packages/65/5b/cfaeebf25cd9fdec14338ccb16f6b2c4c7fa9163aefcf057d86b9cc248bb/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
290.5      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 11.2 MB/s eta 0:00:00
291.2 Collecting nvidia-nccl-cu12==2.20.5 (from torch==2.3.1)
291.2   Downloading http://mirrors.aliyun.com/pypi/packages/4b/2a/0a131f572aa09f741c30ccd45a8e56316e8be8dfc7bc19bf0ab7cfef7b19/nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)
306.9      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 11.3 MB/s eta 0:00:00
307.5 Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.3.1)
307.5   Downloading http://mirrors.aliyun.com/pypi/packages/da/d3/8057f0587683ed2fcd4dbfbdfdfa807b9160b809976099d36b8f60d08f03/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
307.6 Collecting triton==2.3.1 (from torch==2.3.1)
307.7   Downloading http://mirrors.aliyun.com/pypi/packages/d7/69/8a9fde07d2d27a90e16488cdfe9878e985a247b2496a4b5b1a2126042528/triton-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.1 MB)
339.7      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.1/168.1 MB 4.8 MB/s eta 0:00:00
340.3 Requirement already satisfied: numpy in /usr/local/python3.10.5/lib/python3.10/site-packages (from torchvision==0.18.1) (1.24.0)
341.0 Collecting pillow!=8.3.*,>=5.3.0 (from torchvision==0.18.1)
341.1   Downloading http://mirrors.aliyun.com/pypi/packages/41/c3/94f33af0762ed76b5a237c5797e088aa57f2b7fa8ee7932d399087be66a8/pillow-11.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (4.4 MB)
341.7      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 7.0 MB/s eta 0:00:00
341.8 INFO: pip is looking at multiple versions of torch-npu to determine which version is compatible with other requirements. This could take a while.
341.8 ERROR: Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies.
341.8
341.8 The conflict is caused by:
341.8     The user requested torch==2.3.1
341.8     torchvision 0.18.1 depends on torch==2.3.1
341.8     torch-npu 2.3.1 depends on torch==2.3.1+cpu
341.8
341.8 To fix this you could try to:
341.8 1. loosen the range of package versions you've specified
341.8 2. remove package versions to allow pip to attempt to solve the dependency conflict
341.8
341.8 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
------
Dockerfile_aarch64_ascend:110
--------------------
 109 |     # timm is required for internvl2 model
 110 | >>> RUN --mount=type=cache,target=/root/.cache/pip \
 111 | >>>     pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && \
 112 | >>>     pip3 install transformers timm && \
 113 | >>>     pip3 install dlinfer-ascend
 114 |
--------------------
ERROR: failed to solve: process "/bin/bash -c pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&     pip3 install transformers timm &&     pip3 install dlinfer-ascend" did not complete successfully: exit code: 1

CyCle1024 commented 5 days ago

@jiabao-wang Hi, are you building docker image on x86_64 platform? Currently, the Dockerfile is only supported on aarch64. For x86_64 platform, the pypi package of dlinfer is not uploaded, as well as the problem you mentioned above. There's a workaround for this case which is not released yet.

CyCle1024 commented 5 days ago

@jiabao-wang Here is a new Dockerfile for ascend x86_64 platform, it's only tested for building on x86_64 machine, the inference of models is not tested yet since we don't have any x86_64 ascend npu machine.

FROM ubuntu:20.04 as base_builder

WORKDIR /tmp

ARG http_proxy
ARG https_proxy
ARG DEBIAN_FRONTEND=noninteractive

RUN sed -i 's@http://.*.ubuntu.com@http://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \
    apt update && \
    apt install --no-install-recommends ca-certificates -y && \
    apt install --no-install-recommends bc wget -y && \
    apt install --no-install-recommends git curl gcc make g++ pkg-config unzip -y && \
    apt install --no-install-recommends libsqlite3-dev libblas3 liblapack3 gfortran vim -y && \
    apt install --no-install-recommends liblapack-dev libblas-dev libhdf5-dev libffi-dev -y && \
    apt install --no-install-recommends libssl-dev zlib1g-dev xz-utils cython3 python3-h5py -y && \
    apt install --no-install-recommends libopenblas-dev libgmpxx4ldbl liblzma-dev -y && \
    apt install --no-install-recommends libicu66 libxml2 pciutils libgl1-mesa-glx libbz2-dev -y && \
    apt install --no-install-recommends libreadline-dev libncurses5 libncurses5-dev libncursesw5 -y && \
    sed -i 's@http://mirrors.tuna.tsinghua.edu.cn@https://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \
    apt clean && rm -rf /var/lib/apt/lists/*

ARG PYVERSION=3.10.5

ENV LD_LIBRARY_PATH=/usr/local/python${PYVERSION}/lib: \
    PATH=/usr/local/python${PYVERSION}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN umask 0022  && \
    wget https://repo.huaweicloud.com/python/${PYVERSION}/Python-${PYVERSION}.tar.xz && \
    tar -xf Python-${PYVERSION}.tar.xz && cd Python-${PYVERSION} && ./configure --prefix=/usr/local/python${PYVERSION} --enable-shared && \
    make -j 16 && make install && \
    ln -sf /usr/local/python${PYVERSION}/bin/python3 /usr/bin/python3 && \
    ln -sf /usr/local/python${PYVERSION}/bin/python3 /usr/bin/python && \
    ln -sf /usr/local/python${PYVERSION}/bin/pip3 /usr/bin/pip3 && \
    ln -sf /usr/local/python${PYVERSION}/bin/pip3 /usr/bin/pip && \
    cd .. && \
    rm -rf Python*

RUN --mount=type=cache,target=/root/.cache/pip pip3 config set global.index-url http://mirrors.aliyun.com/pypi/simple && \
    pip3 config set global.trusted-host mirrors.aliyun.com && \
    pip3 install -U pip && \
    pip3 install wheel==0.43.0 scikit-build==0.18.0 numpy==1.24 setuptools==69.5.1 && \
    pip3 install decorator sympy cffi && \
    pip3 install cmake ninja pyyaml && \
    pip3 install pathlib2 protobuf attrs attr scipy && \
    pip3 install requests psutil absl-py

ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/hdf5/serial:$LD_LIBRARY_PATH

FROM ubuntu:20.04 as build_temp
COPY . /tmp

FROM base_builder as cann_builder

ARG ASCEND_BASE=/usr/local/Ascend
ARG TOOLKIT_PATH=$ASCEND_BASE/ascend-toolkit/latest

ENV LD_LIBRARY_PATH=\
$ASCEND_BASE/driver/lib64:\
$ASCEND_BASE/driver/lib64/common:\
$ASCEND_BASE/driver/lib64/driver:\
$ASCEND_BASE/driver/tools/hccn_tool/:\
$TOOLKIT_PATH/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/x86_64/:\
$LD_LIBRARY_PATH

# run files should be placed at the root dir of repo
ARG CHIP=all
ARG TOOLKIT_PKG=Ascend-cann-toolkit_*.run
ARG KERNELS_PKG=Ascend-cann-kernels-*.run
ARG NNAL_PKG=Ascend-cann-nnal_*.run

RUN --mount=type=cache,target=/tmp,from=build_temp,source=/tmp \
    umask 0022 && \
    mkdir -p $ASCEND_BASE/driver && \
    if [ "$CHIP" != "all" ]; \
    then \
        CHIPOPTION="--chip=$CHIP"; \
    else \
        CHIPOPTION=""; \
    fi && \
    chmod +x $TOOLKIT_PKG $KERNELS_PKG $NNAL_PKG && \
    ./$TOOLKIT_PKG --quiet --install --install-path=$ASCEND_BASE --install-for-all $CHIPOPTION && \
    ./$KERNELS_PKG --quiet --install --install-path=$ASCEND_BASE --install-for-all && \
    . /usr/local/Ascend/ascend-toolkit/set_env.sh && \
    ./$NNAL_PKG --quiet --install --install-path=$ASCEND_BASE && \
    rm -f $TOOLKIT_PKG $KERNELS_PKG $NNAL_PKG

ENV GLOG_v=2 \
    LD_LIBRARY_PATH=$TOOLKIT_PATH/lib64:$LD_LIBRARY_PATH \
    TBE_IMPL_PATH=$TOOLKIT_PATH/opp/op_impl/built-in/ai_core/tbe \
    PATH=$TOOLKIT_PATH/ccec_compiler/bin:$PATH \
    ASCEND_OPP_PATH=$TOOLKIT_PATH/opp \
    ASCEND_AICPU_PATH=$TOOLKIT_PATH

ENV PYTHONPATH=$TBE_IMPL_PATH:$PYTHONPATH

SHELL ["/bin/bash", "-c"]
RUN echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc && \
    echo "source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0" >> ~/.bashrc && \
    . ~/.bashrc

# dlinfer
# timm is required for internvl2 model
WORKDIR /opt/
RUN --mount=type=cache,target=/root/.cache/pip \
    pip3 install torch==2.3.1+cpu torchvision==0.18.1+cpu --index-url=https://download.pytorch.org/whl/cpu && \
    pip3 install torch-npu==2.3.1 && \
    pip3 install transformers timm && \
    git clone https://github.com/DeepLink-org/dlinfer.git && \
    cd dlinfer && DEVICE=ascend python setup.py develop

# lmdeploy
FROM build_temp as copy_temp
RUN rm -rf /tmp/*.run

FROM cann_builder as final_builder
COPY --from=copy_temp /tmp /opt/lmdeploy
WORKDIR /opt/lmdeploy

RUN --mount=type=cache,target=/root/.cache/pip \
    sed -i '/triton/d' requirements/runtime.txt && \
    pip3 install -v --no-build-isolation -e .

jiabao-wang commented 4 days ago

@CyCle1024 I have make the docker iamge following the Dockerfile for ascend x86_64 but when i try to run: docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env

output error: (base) wjb@ubuntu-Atlas-800-Model-3010:~$ docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module return importlib.import_module("." + module_name, self.name) File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/generation/utils.py", line 115, in from accelerate.hooks import AlignDevicesHook, add_hook_to_module File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/init.py", line 16, in from .accelerator import Accelerator File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/accelerator.py", line 36, in from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in from .utils import ( File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/utils/init.py", line 126, in from .modeling import ( File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 31, in from ..state import AcceleratorState File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/state.py", line 64, in if is_npu_available(check_device=False): File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/utils/imports.py", line 362, in is_npu_available import torch_npu # noqa: F401 File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/init.py", line 16, in import torch_npu.npu File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/npu/init.py", line 119, in from torch_npu.utils.error_code import ErrCode, pta_error, prof_error File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/init.py", line 1, in from ._module import _apply_module_patch File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/_module.py", line 26, in from torch_npu.npu.amp.autocast_mode import autocast File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/npu/amp/init.py", line 6, in from .grad_scaler import GradScaler # noqa: F401 File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/npu/amp/grad_scaler.py", line 8, in from torch.amp.grad_scaler import _MultiDeviceReplicator, OptState, _refresh_per_optimizer_state ModuleNotFoundError: No module named 'torch.amp.grad_scaler'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module return importlib.import_module("." + module_name, self.name) File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/models/auto/modeling_auto.py", line 21, in from .auto_factory import ( File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 40, in from ...generation import GenerationMixin File "", line 1075, in _handle_fromlist File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1766, in getattr module = self._get_module(self._class_to_module[name]) File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1780, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback): No module named 'torch.amp.grad_scaler'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/python3.10.5/bin/lmdeploy", line 33, in sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')()) File "/usr/local/python3.10.5/bin/lmdeploy", line 25, in importlib_load_entry_point return next(matches).load() File "/usr/local/python3.10.5/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/opt/lmdeploy/lmdeploy/init.py", line 3, in from .api import client, pipeline, serve File "/opt/lmdeploy/lmdeploy/api.py", line 5, in from .archs import autoget_backend_config, get_task File "/opt/lmdeploy/lmdeploy/archs.py", line 6, in from lmdeploy.serve.vl_async_engine import VLAsyncEngine File "/opt/lmdeploy/lmdeploy/serve/vl_async_engine.py", line 8, in from lmdeploy.vl.engine import ImageEncoder File "/opt/lmdeploy/lmdeploy/vl/engine.py", line 12, in from lmdeploy.vl.model.builder import load_vl_model File "/opt/lmdeploy/lmdeploy/vl/model/builder.py", line 7, in from .internvl import InternVLVisionModel File "/opt/lmdeploy/lmdeploy/vl/model/internvl.py", line 7, in from transformers import AutoConfig, AutoModel, CLIPImageProcessor File "", line 1075, in _handle_fromlist File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1767, in getattr value = getattr(module, name) File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1766, in getattr module = self._get_module(self._class_to_module[name]) File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1780, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.auto.modeling_auto because of the following error (look up to see its traceback): Failed to import transformers.generation.utils because of the following error (look up to see its traceback): No module named 'torch.amp.grad_scaler'

jinminxi104 commented 11 hours ago

docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env

Is this a typo? (lmdeploy-aarch64-ascend:latest )

I have a check_env result from x86 machine, please check it.

jiabao-wang commented 9 hours ago

@CyCle1024 I have successed run "docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env" after I change "pip3 install transformers" to "pip3 install transformers==4.42.3" in docker file.

but when i test code: from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig if name == "main": pipe = pipeline("internlm/internlm2_5-7b-chat", backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) question = ["Shanghai is", "Please introduce China", "How are you?"] response = pipe(question) print(response)

output : pipe = pipeline("internlm/internlm2_5-7b-chat",backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'

my env:

when i run: from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image if name == "main": pipe = pipeline('OpenGVLab/InternVL2-2B', backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') response = pipe(('describe this image', image)) print(response)

also output: Traceback (most recent call last): File "/opt/mycode/vl.py", line 4, in pipe = pipeline('OpenGVLab/InternVL2-2B',backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'

CyCle1024 commented 8 hours ago

@CyCle1024 I have successed run "docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env" after I change "pip3 install transformers" to "pip3 install transformers==4.42.3" in docker file.

but when i test code: from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig if name == "main": pipe = pipeline("internlm/internlm2_5-7b-chat", backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) question = ["Shanghai is", "Please introduce China", "How are you?"] response = pipe(question) print(response)

output : pipe = pipeline("internlm/internlm2_5-7b-chat",backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'

my env:

when i run: from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image if name == "main": pipe = pipeline('OpenGVLab/InternVL2-2B', backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') response = pipe(('describe this image', image)) print(response)

also output: Traceback (most recent call last): File "/opt/mycode/vl.py", line 4, in pipe = pipeline('OpenGVLab/InternVL2-2B',backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) TypeError: PytorchEngineConfig.init() got an unexpected keyword argument 'device_type'

@jiabao-wang https://github.com/InternLM/lmdeploy/blob/0c80baa001e79d0b7d182b8a670190801d2d8d5b/lmdeploy/messages.py#L222-L261 refer to line 249 here, device_type is valid. What is the commit of lmdeploy that you used?

jiabao-wang commented 8 hours ago

@CyCle1024 I use lmdeploy 0.4.0 /opt/lmdeploy

but how to use on 华为昇腾 ascend? I follow this article：https://lmdeploy.readthedocs.io/zh-cn/latest/get_started/ascend/get_started.html

CyCle1024 commented 8 hours ago

@CyCle1024 I use lmdeploy 0.4.0 /opt/lmdeploy

but how to use on 华为昇腾 ascend? I follow this article：https://lmdeploy.readthedocs.io/zh-cn/latest/get_started/ascend/get_started.html

@jiabao-wang It does not support ascend, it's weird you following the latest docs of lmdeploy to start with ascend, but using old version. To enable ascend suppport, using the latest release of lmdeploy, such as https://github.com/InternLM/lmdeploy/tree/v0.6.3

InternLM / lmdeploy