InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.17k stars 376 forks source link

[Bug] lmdeploy 0.2.3 无法运行 报failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT #1169

Closed jiaenyue closed 6 months ago

jiaenyue commented 6 months ago

Checklist

Describe the bug

lmdeploy 0.2.2在 internlm2-chat-7b下可以正确运行, 升级到lmdeploy 0.2.3后 lmdeploy serve api_server ~/Models/internlm/internlm2-chat-7b --tp 1 可以启动服务, 但是客户端发第一个请求,就报错退出了

[WARNING] gemm_config.in is not found; using default GEMM algo HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! INFO: Started server process [426005] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) [AMP ERROR][CudaFrontend.cpp:94][1708430273:970157]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

=============================================== Back trace dump: /usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fb6d8ea6302] /lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, int const)+0x241) [0x7fb6d90d5471] /lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, const CUdevice): Assertion `0' failed. Aborted (core dumped)

lmdeploy 0.2.2时候torch-1.13.1 安装 lmdeploy 0.2.3时候 torch-2.12升级到了

Reproduction

lmdeploy serve api_server ~/Models/internlm/internlm2-chat-7b --tp 1

Environment

(lmdeploy) root@intern-studio-40059143:~/Study# lmdeploy check_env
sys.platform: linux
Python: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.5  (built against CUDA 11.7)
    - Built with CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

LMDeploy: 0.2.3+
transformers: 4.37.1
gradio: 3.50.2
fastapi: 0.109.2
pydantic: 2.6.1

Error traceback

(lmdeploy) root@intern-studio-40059143:~/Study/opencompass# lmdeploy serve api_server  ~/Models/internlm/internlm2-chat-7b  --tp 1
2024-02-20 20:43:02,272 - lmdeploy - WARNING - Best matched chat template name: internlm2-chat-7b
2024-02-20 20:43:02,404 - lmdeploy - WARNING - model_source: hf_model
2024-02-20 20:43:07,346 - lmdeploy - WARNING - model_config:

[llama]
model_name = internlm2-chat-7b
tensor_para_size = 1
head_num = 32
kv_head_num = 8
vocab_size = 92544
num_layer = 32
inter_size = 14336
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 32768
weight_type = fp16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.8
cache_block_seq_len = 128
cache_chunk_size = -1
num_tokens_per_iter = 0
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
rope_scaling_factor = 0.0
use_logn_attn = 0

2024-02-20 20:43:07,963 - lmdeploy - WARNING - get 259 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
HINT:    Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
INFO:     Started server process [462192]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit)
[AMP ERROR][CudaFrontend.cpp:94][1708433019:135839]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

===============================================
Back trace dump:
/usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7f3258e79302]
/lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x241) [0x7f32590a8471]
/lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed.
Aborted (core dumped)
jiaenyue commented 6 months ago

限定CUDA_VISIBLE_DEVICES=0后,报相同错误信息 运行脚本 CUDA_VISIBLE_DEVICES=0 lmdeploy serve api_server ~/Models/internlm/internlm2-chat-7b --tp 1

报错: 2024-02-20 20:47:17,527 - lmdeploy - WARNING - get 259 model params [WARNING] gemm_config.in is not found; using default GEMM algo HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! INFO: Started server process [465509] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) [AMP ERROR][CudaFrontend.cpp:94][1708433270:80921]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

=============================================== Back trace dump: /usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fb4d70fa302] /lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, int const)+0x241) [0x7fb4d7329471] /lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, const CUdevice): Assertion `0' failed. Aborted (core dumped)

AllentDan commented 6 months ago
lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1

能正常对话吗

jiaenyue commented 6 months ago

不能对话,客户端请求刚到,就退出了

lmdeploy) root@intern-studio-40059143:~# lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1

2024-02-21 13:11:56,383 - lmdeploy - WARNING - model_source: hf_model 2024-02-21 13:11:56,383 - lmdeploy - WARNING - kwargs tp is deprecated to initialize model, use TurbomindEngineConfig instead. 2024-02-21 13:11:56,383 - lmdeploy - WARNING - kwargs cache_max_entry_count is deprecated to initialize model, use TurbomindEngineConfig instead. 2024-02-21 13:12:00,682 - lmdeploy - WARNING - model_config:

[llama] model_name = internlm2-chat-7b tensor_para_size = 1 head_num = 32 kv_head_num = 8 vocab_size = 92544 num_layer = 32 inter_size = 14336 norm_eps = 1e-05 attn_bias = 0 start_id = 1 end_id = 2 session_len = 32776 weight_type = fp16 rotary_embedding = 128 rope_theta = 1000000.0 size_per_head = 128 group_size = 0 max_batch_size = 128 max_context_token_num = 1 step_length = 1 cache_max_entry_count = 0.8 cache_block_seq_len = 128 cache_chunk_size = -1 num_tokens_per_iter = 0 max_prefill_iters = 1 extra_tokens_per_iter = 0 use_context_fmha = 1 quant_policy = 0 max_position_embeddings = 32768 rope_scaling_factor = 0.0 use_logn_attn = 0

2024-02-21 13:12:02,559 - lmdeploy - WARNING - get 259 model params 2024-02-21 13:12:14,856 - lmdeploy - WARNING - Input chat template with model_name is None. Forcing to use internlm2-chat-7b [WARNING] gemm_config.in is not found; using default GEMM algo session 1

double enter to end input >>> <|im_start|>system You are an AI assistant whose name is InternLM (书生·浦语).

=============================================== Back trace dump: /usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fbc6d94f302] /lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, int const)+0x241) [0x7fbc6db7e471] /lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, const CUdevice): Assertion `0' failed.

Aborted (core dumped)

AllentDan commented 6 months ago

这个日志看起来是没能跟GPU设备PCI通信成功。你可以正常跑其他cuda代码吗?比如python -c "import torch;print(torch.rand(1).cuda())"

AllentDan commented 6 months ago

另外可以贴一下 lmdeploy check_env结果

jiaenyue commented 6 months ago

(lmdeploy) root@intern-studio-40059143:~/Study# lmdeploy check_env sys.platform: linux Python: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.7, V11.7.99 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 2.1.2+cu121 PyTorch compiling details: PyTorch built with:

LMDeploy: 0.2.3+ transformers: 4.37.1 gradio: 3.50.2 fastapi: 0.109.2 pydantic: 2.6.1

jiaenyue commented 6 months ago

(lmdeploy) root@intern-studio-40059143:~# python -c "import torch;print(torch.rand(1).cuda())" tensor([0.3619], device='cuda:0')

jiaenyue commented 6 months ago

这个机器在lmdeploy==0.2.2上是能正常工作的,升级到0.2.3就报错, 有人说是安装 的torch-2.1.2版本太高,可lmdeploy 0.2.3最低要求torch-2.1.2

import torch torch.version '2.1.2+cu121'

AllentDan commented 6 months ago

感觉是 cuda 版本乱了。我看你的环境里,cuda 版本是 11.7,然后装的 lmdeploy 应该也是对应 cuda 12.1 的吧。可以看下装的 lmdeploy 的依赖的 cuda 要求。类似 objdump -x lmdeploy/lib/_turbomind.cpython-310-x86_64-linux-gnu.so | grep NEEDED

irexyc commented 6 months ago

pytorch的要求应该是 torch<=2.1.2,>=2.0.0,不是最低要求。

如果是pip install lmdeploy的话,装的是lmdeploy+cu118,不确定是不是多个cuda版本造成的冲突。你装的pytorch版本是PyTorch: 2.1.2+cu121,可以重新装个cu118的试试 pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

jiaenyue commented 6 months ago

lmdeploy 的依赖的 cuda 如下:

(lmdeploy) root@intern-studio-40059143:~# objdump -x ./.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/lib/_turbomind.cpython-310-x86_64-linux-gnu.so | grep NEEDED NEEDED libnccl.so.2 NEEDED libcudart.so.11.0 NEEDED libcublas.so.11 NEEDED libcublasLt.so.11 NEEDED libcurand.so.10 NEEDED libdl.so.2 NEEDED libstdc++.so.6 NEEDED libm.so.6 NEEDED libgcc_s.so.1 NEEDED libc.so.6 NEEDED ld-linux-x86-64.so.2

jiaenyue commented 6 months ago

重新装个cu118的试试 pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118, 然后重新安装pip install lmdeploy==0.2.3 lmdeploy chat 报同样错误

(lmdeploy) root@intern-studio-40059143:~# lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1 2024-02-21 14:40:04,725 - lmdeploy - WARNING - model_source: hf_model 2024-02-21 14:40:04,726 - lmdeploy - WARNING - kwargs tp is deprecated to initialize model, use TurbomindEngineConfig instead. 2024-02-21 14:40:04,726 - lmdeploy - WARNING - kwargs cache_max_entry_count is deprecated to initialize model, use TurbomindEngineConfig instead. 2024-02-21 14:40:09,417 - lmdeploy - WARNING - model_config:

[llama] model_name = internlm2-chat-7b tensor_para_size = 1 head_num = 32 kv_head_num = 8 vocab_size = 92544 num_layer = 32 inter_size = 14336 norm_eps = 1e-05 attn_bias = 0 start_id = 1 end_id = 2 session_len = 32776 weight_type = fp16 rotary_embedding = 128 rope_theta = 1000000.0 size_per_head = 128 group_size = 0 max_batch_size = 128 max_context_token_num = 1 step_length = 1 cache_max_entry_count = 0.8 cache_block_seq_len = 128 cache_chunk_size = -1 num_tokens_per_iter = 0 max_prefill_iters = 1 extra_tokens_per_iter = 0 use_context_fmha = 1 quant_policy = 0 max_position_embeddings = 32768 rope_scaling_factor = 0.0 use_logn_attn = 0

2024-02-21 14:40:09,921 - lmdeploy - WARNING - get 259 model params 2024-02-21 14:40:21,620 - lmdeploy - WARNING - Input chat template with model_name is None. Forcing to use internlm2-chat-7b [WARNING] gemm_config.in is not found; using default GEMM algo session 1

double enter to end input >>> nihao

<|im_start|>system You are an AI assistant whose name is InternLM (书生·浦语).

=============================================== Back trace dump: /usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7f61482ea302] /lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, int const)+0x241) [0x7f6148519471] /lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, const CUdevice): Assertion `0' failed. Aborted (core dumped)

而且lmdeploy check_env 无法运行 (lmdeploy) root@intern-studio-40059143:~# lmdeploy check_env

Traceback (most recent call last): File "/root/.conda/envs/lmdeploy/bin/lmdeploy", line 8, in sys.exit(run()) File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 18, in run args.run(args) File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/cli.py", line 131, in check_env env_info = collect_env() File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/mmengine/utils/dl_utils/collect_env.py", line 156, in collect_env import torchvision File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/torchvision/init.py", line 6, in from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 7, in import torchvision.extension # noqa: F401 File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/torchvision/extension.py", line 92, in _check_cuda_version() File "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/torchvision/extension.py", line 78, in _check_cuda_version raise RuntimeError( RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA major versions. PyTorch has CUDA Version=12.1 and torchvision has CUDA Version=11.8. Please reinstall the torchvision that matches your PyTorch install.

irexyc commented 6 months ago

要不你先卸载pytorch,torchvision然后再装。lmdeploy不要重新安装了。装完pytorch,再装别的话,可能又被更新了。

最下面的log有这么一段话

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA major versions. PyTorch has CUDA Version=12.1 and torchvision has CUDA Version=11.8. Please reinstall the torchvision that matches your PyTorch install.

jiaenyue commented 6 months ago

重新建立了conda环境

517 conda create -n lmdeploy_0.2.3 python=3.10 518 conda activate lmdeploy_0.2.3 519 pip install lmdeploy 520 lmdeploy --version 521 lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1 报错相同错误

522 pip uninstall torch torchvision 523 pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 524 lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1 报错相同错误

(lmdeploy_0.2.3) root@intern-studio-40059143:~# lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1 /root/.conda/envs/lmdeploy_0.2.3/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning') 2024-02-21 15:04:03,552 - lmdeploy - WARNING - model_source: hf_model 2024-02-21 15:04:03,552 - lmdeploy - WARNING - kwargs tp is deprecated to initialize model, use TurbomindEngineConfig instead. 2024-02-21 15:04:03,552 - lmdeploy - WARNING - kwargs cache_max_entry_count is deprecated to initialize model, use TurbomindEngineConfig instead. 2024-02-21 15:04:07,993 - lmdeploy - WARNING - model_config:

[llama] model_name = internlm2-chat-7b tensor_para_size = 1 head_num = 32 kv_head_num = 8 vocab_size = 92544 num_layer = 32 inter_size = 14336 norm_eps = 1e-05 attn_bias = 0 start_id = 1 end_id = 2 session_len = 32776 weight_type = fp16 rotary_embedding = 128 rope_theta = 1000000.0 size_per_head = 128 group_size = 0 max_batch_size = 128 max_context_token_num = 1 step_length = 1 cache_max_entry_count = 0.8 cache_block_seq_len = 128 cache_chunk_size = -1 num_tokens_per_iter = 0 max_prefill_iters = 1 extra_tokens_per_iter = 0 use_context_fmha = 1 quant_policy = 0 max_position_embeddings = 32768 rope_scaling_factor = 0.0 use_logn_attn = 0

2024-02-21 15:04:08,463 - lmdeploy - WARNING - get 259 model params 2024-02-21 15:04:16,284 - lmdeploy - WARNING - Input chat template with model_name is None. Forcing to use internlm2-chat-7b [WARNING] gemm_config.in is not found; using default GEMM algo session 1

double enter to end input >>> nihao

<|im_start|>system You are an AI assistant whose name is InternLM (书生·浦语).

=============================================== Back trace dump: /usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fcdb1fbd302] /lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, int const)+0x241) [0x7fcdb21ec471] /lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend, const CUdevice): Assertion `0' failed. Aborted (core dumped)

irexyc commented 6 months ago

nvidia-smi 输出是什么

jiaenyue commented 6 months ago

Screen Shot 2024-02-21 at 3 35 16 PM

irexyc commented 6 months ago

@jiaenyue

又想到一点,打印下 echo $LD_LIBRARY_PATH 看看?如果有/usr/local/cuda/lib64的话,这个cuda是什么版本?

pip install lmdeploy的时候会自动安装相关的runtime,但是如果指定了LD_LIBRARY_PATH的话,会优先考虑LD_LIBRARY_PATH中的内容,看下是不是这里冲突了,可以把cuda相关的路径从这里去掉试试。

jiaenyue commented 6 months ago

echo $LD_LIBRARY_PATH 返回为空,环境变量里没有设定

/usr/local/cuda/lib64/里的cuda如下: (lmdeploy_0.2.3) root@intern-studio-40059143:~# ll /usr/local/cuda/lib64/libcuda* -rw-r--r-- 1 root root 865940 Jun 9 2022 /usr/local/cuda/lib64/libcudadevrt.a lrwxrwxrwx 1 root root 17 Jun 9 2022 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.11.0 lrwxrwxrwx 1 root root 20 Jun 9 2022 /usr/local/cuda/lib64/libcudart.so.11.0 -> libcudart.so.11.7.99 -rw-r--r-- 1 root root 671072 Jun 9 2022 /usr/local/cuda/lib64/libcudart.so.11.7.99 -rw-r--r-- 1 root root 1178522 Jun 9 2022 /usr/local/cuda/lib64/libcudart_static.a

printenv的结果如下: (lmdeploy_0.2.3) root@intern-studio-40059143:~# printenv SHELL=/bin/bash no_proxy=localhost,127.0.0.1,0.0.0.0,172.18.47.140 CONDA_EXE=/root/.conda/bin/conda _CE_M= SSH_AUTH_SOCK=/tmp/ssh-I0aGu3InOj/agent.342 FB_USERNAME=40059143 PWD=/root LOGNAME=root CONDA_PREFIX=/root/.conda/envs/lmdeploy_0.2.3 AIDE_STORAGE_QUOTA=102400 MOTD_SHOWN=pam HOME=/root LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:*.xspf=00;36: CONDA_PROMPT_MODIFIER=(lmdeploy_0.2.3) LC_TERMINAL=iTerm2 https_proxy=http://proxy.intern-ai.org.cn:50000 SSH_CONNECTION=127.0.0.1 47680 127.0.0.1 22 LESSCLOSE=/usr/bin/lesspipe %s %s TERM=xterm-256color _CE_CONDA= LESSOPEN=| /usr/bin/lesspipe %s USER=root CONDA_SHLVL=2 LC_TERMINAL_VERSION=3.3.8 SHLVL=2 http_proxy=http://proxy.intern-ai.org.cn:50000 CONDA_PYTHON_EXE=/root/.conda/bin/python LC_CTYPE=C.UTF-8 SSH_CLIENT=127.0.0.1 47680 22 CONDA_DEFAULT_ENV=lmdeploy_0.2.3 AIDE_INSTANCE_ID=20240207-0b9680e-40059143 HF_ENDPOINT=https://hf-mirror.com PATH=/root/.local/bin:/root/.conda/envs/lmdeploy_0.2.3/bin:/root/.conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin QUOTA_STATUS=1 SSH_TTY=/dev/pts/0 CONDA_PREFIX_1=/root/.conda AIDE_BASEURL=http://studio-in.intern-ai.org.cn =/usr/bin/printenv

irexyc commented 6 months ago

暂时想不到原因了,0.2.2和0.2.3 依赖的差别就是这里。

image

lmdeploy_0.2.3 这个环境装lmdeploy 0.2.2能正常运行么?

jiaenyue commented 6 months ago

0.2.2运行是正常的, 这个问题春节前升级到0.2.3就发现了,春节期间曾经测试安装torch 2.10, torch 2.11都不行。后来在群里问人, 有人怀疑是0.2.3, 要求torch >=2.0.0,造成的intern-studio的系统不兼容

irexyc commented 6 months ago

我在 openxlab 的机器上复现了,跟pytorch版本有关,具体的原因还不清楚,猜测可能跟vgpu相关。

我试了下 pytorch 2.0.0是可以的。pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

jiaenyue commented 6 months ago

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118 成功安装 torch==2.0.0

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 lmdeploy 这个安装成功, 可是 lmdeploy == 0.2.2

限定安装lmdeploy==0.2.3后, 报冲突 pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 lmdeploy==0.2.3

ERROR: Cannot install lmdeploy==0.2.3 and torch==2.0.0 because these package versions have conflicting dependencies.

The conflict is caused by: torch 2.0.0 depends on triton==2.0.0; platform_system == "Linux" and platform_machine == "x86_64" lmdeploy 0.2.3 depends on triton<2.2.0 and >=2.1.0

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

irexyc commented 6 months ago

如果在openxlab上跑turbomind引擎的话,先装 lmdeploy,然后再装2.0.0版本的pytorch,冲突可以不用管,用不到triton

jiaenyue commented 6 months ago

成功了,多谢 具体步骤如下: conda create -n lmdeploy-0.2.4 python==3.10 -y

conda activate lmdeploy-0.2.4

pip install lmdeploy

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

lmdeploy chat turbomind ~/Models/internlm/internlm2-chat-7b --tp 1

结果正常运行了