Closed jianliao closed 3 months ago
The related PR #1984 #1913 haven't been merged yet.
What is the version of lmdepoy? @jianliao The latest lmdeploy can run the model with the default backend turbomind.
@AllentDan I upgraded to the latest version (0.5.2.post1), but I am still encountering the same error with the following command:
lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ
.
Here are the details of my lmdeploy version:
(lmdeploy) jianliao@jianliao-ubuntu:~$ pip show lmdeploy
Name: lmdeploy
Version: 0.5.2.post1
Summary: A toolset for compressing, deploying and serving LLM
Home-page:
Author: OpenMMLab
Author-email: openmmlab@gmail.com
License:
Location: /home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages
Requires: accelerate, einops, fastapi, fire, mmengine-lite, numpy, nvidia-cublas-cu12, nvidia-cuda-runtime-cu12, nvidia-curand-cu12, nvidia-nccl-cu12, peft, pillow, protobuf, pydantic, pynvml, safetensors, sentencepiece, shortuuid, tiktoken, torch, torchvision, transformers, triton, uvicorn
Required-by:
Can you try adding --model-format awq
?
@AllentDan @lvhan028 The issue has been resolved after applying the --model-format awq option. Thanks Bro.
Checklist
Describe the bug
> lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ
如果切换Backend,能够运行但是会输出大量的log,详见此附件bug.log
Reproduction
> lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ
or
lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ --backend pytorch
Environment
Error traceback