Closed RexWzh closed 5 months ago
类似命令运行 7B 模型正常:
python -m vllm.entrypoints.openai.api_server \
--model /sshfs/pretrains/Qwen/Qwen2-7B-Instruct \
--trust-remote-code --tensor-parallel-size 2 --served-model-name qwen \
--max-model-len 4096
Hi, what is your pytorch cuda version and nvidia driver version?
❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
❯ python -V
Python 3.10.14
❯ pip list | grep torch
torch 2.3.0
❯ pip list | grep cuda
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
And Nvidia driver
❯ nvidia-smi
Fri Jun 7 17:22:52 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 26W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:25:00.0 Off | N/A |
| 46% 48C P2 111W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce RTX 3090 Off | 00000000:41:00.0 Off | N/A |
| 40% 49C P2 113W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce RTX 3090 Off | 00000000:61:00.0 Off | N/A |
| 34% 44C P2 121W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce RTX 3090 Off | 00000000:81:00.0 Off | N/A |
| 39% 47C P2 109W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce RTX 3090 Off | 00000000:A1:00.0 Off | N/A |
| 39% 47C P2 121W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce RTX 3090 Off | 00000000:C1:00.0 Off | N/A |
| 45% 49C P2 110W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce RTX 3090 Off | 00000000:E1:00.0 Off | N/A |
| 31% 46C P2 108W / 350W | 17259MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
Hi,
the pytorch CUDA version can be confirmed by python -c "import torch; print(torch.version.cuda)"
.
ok, it is 2.3.0+cu121
❯ python -c "import torch; print(torch.version.cuda)"
12.1
❯ python -c "import torch; print(torch.__version__)"
2.3.0+cu121
It works fine on another server with cuda 11.6
❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
❯ pip list | grep -P "vllm|torch|cuda"
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
torch 2.3.0
vllm 0.4.3
vllm-flash-attn 2.5.8.post
❯ curl -H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-X POST \
-d '{"model": "qwen", "messages": [{"role": "user", "content": "介绍你自己"}], "stream":false}' \
http://localhost:8000/v1/chat/completions
{"id":"cmpl-ea52ccfc99bf45d3999e3873c19be2f7","object":"chat.completion","created":1717765410,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","content":"我是来自阿里云的大规模语言模型,我叫通义千问。我是阿里云自主研发的超大规模语言模型,也能够生成与人类相似的文本,比如写故事、写公文、写邮件、写剧本等等。同时,我也能够帮助人们回答问题、创作文字,比如写故事、写公文、写邮件、写剧本等等,还能表达观点,玩游戏。如果您有任何问题或需要帮助,请随时告诉我,我会尽力提供支持。"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":21,"total_tokens":120,"completion_tokens":99}}
Thanks for your reply. I am not sure about the cause of this problem, but suddenly it worked, and there was no garbled text.
我也遇到一样的问题,偶发推理结果乱码(中间有中文词语)
和这里 https://github.com/QwenLM/Qwen2/issues/485 不太一样,用的是 vLLM,乱码不只是纯字母
启动方式:
以及
配置信息:3090 * 8