InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.7k stars 430 forks source link

[Bug] Phi-3-vision-128k-instruct 跑模型在8卡上出现 “Expected all tensors to be on the same device, but found at least two devices” #2633

Open dreamerlin opened 1 month ago

dreamerlin commented 1 month ago

Checklist

Describe the bug

image

Reproduction

backend_config = PytorchEngineConfig(tp=8, session_len=session_len) pipe = lmdeploy.pipeline(args.checkpoint, backend_config=backend_config, chat_template_config=ChatTemplateConfig(model_name='phi-3'))

Environment

sys.platform: linux
Python: 3.9.19 (main, May  6 2024, 19:43:03) [GCC 11.2.0]                                                                                                                          CUDA available: False
MUSA available: False                                                                                                                                                          numpy_random_seed: 2147483648                                                                                                                                                      GCC: gcc (GCC) 9.4.0
PyTorch: 2.0.1
PyTorch compiling details: PyTorch built with:                                                                                                                                       - GCC 9.3
 - C++ Version: 201703                                                                                                                                                              Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications                                                             - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)                                                                                                     - OpenMP 201511 (a.k.a. OpenMP 4.5)                                                                                                                                               - LAPACK is enabled (usually provided by MKL)                                                                                                                                     - NNPACK is enabled
- CPU capability usage: AVX2                                                                                                                                                       - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,                                                                                                                                                                                                                                                         TorchVision: 0.15.2
LMDeploy: 0.6.1+2323e69                                                                                                                                                            transformers: 4.45.2
gradio: 4.44.1
fastapi: 0.103.2
pydantic: 2.9.2
triton: 3.0.0

Error traceback

Traceback (most recent call last):
File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/asyncio/events.py", line 80, in _run                                                                           self._context.run(self._callback, *self._args)
File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/site-packages/lmdeploy/vl/engine.py", line 27, in _raise_exception_on_finish
raise e
 File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/site-packages/lmdeploy/vl/engine.py", line 23, in _raise_exception_on_finish
task.result()
File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/concurrent/futures/thread.py", line 58, in run                                                                  result = self.fn(*self.args, **self.kwargs)
File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/site-packages/lmdeploy/vl/engine.py", line 169, in forward
outputs = self.model.forward(*func_inputs)                                                                                                                                      File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
 File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/site-packages/lmdeploy/vl/model/phi3_vision.py", line 193, in forward
image_features = _process_image_embedding(                                                                                                                                       File "/mnt/petrelfs/wangweiyun/miniconda3/envs/lmdploy/lib/python3.9/site-packages/lmdeploy/vl/model/phi3_vision.py", line 64, in _process_image_embedding
glb_img = torch.cat([glb_img, temp_glb_GN],                                                                                                                                   
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:5 and cuda:6! (when checking argument for argument tensors in method wrapper_CUDA_cat)
dreamerlin commented 1 month ago

8卡跑的

dreamerlin commented 1 month ago

顺带,这行代码是不是有问题 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/vl/model/phi3_vision.py#L61 是不是应该是

temp_glb_GN = self.glb_GN.repeat(1, H // 2, 1, 1)

dreamerlin commented 1 month ago

我自己改了代码后(只改了和 device 有关的代码),跑8k with 2 images,做 text needle 任务,输出有问题 image

你们确保 phi 的代码逻辑没错误嘛

RunningLeon commented 1 month ago

@dreamerlin hi, it seems that the implementation in lmdeploy is based on the old version of the phi3 model, see this commit https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/commit/866d1691437a49af79d5f3ad4a34c1750e08d163 . we may update it later. BTW. Could you provide the sample codes with image files to reproduce? THX