InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.33k stars 390 forks source link

[Feature] qwen2系列模型 #1777

Closed Vincent131499 closed 3 months ago

Vincent131499 commented 3 months ago

Motivation

目前看qwen1.5系列是支持的,包含1.8b-110b; 请问对于推出的qwen2系列也都支持吗,包括qwen2-1.5b 7b moe-14b 72b等;

Related resources

No response

Additional context

No response

lvhan028 commented 3 months ago

是支持的。

Vincent131499 commented 3 months ago

我验证了qwen2-1.5b-instruct. 部署镜像:v0.4.2版本 部署服务命令: lmdeploy serve api_server ../pretrained-models/qwen2-1.5b-instruct/ --log-level INFO --backend turbomind --model-format hf --model-name qwen --server-port 23334 --tp 1 --session-len 16384 --max-batch-size 4 --quant-policy 8 --cache-max-entry-count 0.8 --enable-prefix-caching

curl请求命令: curl http://0.0.0.0:23334/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "qwen", "stream": false, "top_p": 0.95, "top_k": 40, "temperature": 0.2, "repetition_penalty": 1.2, "messages": [ {"role": "user", "content": "你叫什么名字"} ]}'

在服务端会报如下bug: 2024-06-14 05:38:45,414 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n你叫什么名字<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=16361, top_p=0.95, top_k=40, temperature=0.2, repetition_penalty=1.2, ignore_eos=False, random_seed=13484932582757945445, stop_words=[151645], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 56568, 99882, 99245, 101419, 151645, 198, 151644, 77091, 198], adapter_name=None. 2024-06-14 05:38:45,414 - lmdeploy - INFO - session_id=2, history_tokens=0, input_tokens=23, max_new_tokens=16361, seq_start=True, seq_end=True, step=0, prep=True [TM][INFO] Set logger level by INFO [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][WARNING] [ProcessInferRequests] Request for 2 received. [TM][WARNING] [ProcessInferRequests] [2] total sequence length (23 + 16361) exceeds session_len (16384), request_output_len is truncated to 16360 [TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 23, max_q = 23, max_k = 23 [TM][INFO] ------------------------- step = 30 ------------------------- [TM][INFO] ------------------------- step = 40 ------------------------- [TM][INFO] ------------------------- step = 50 ------------------------- [TM][INFO] ------------------------- step = 60 ------------------------- [TM][INFO] ------------------------- step = 70 ------------------------- [TM][INFO] ------------------------- step = 80 ------------------------- [TM][INFO] ------------------------- step = 90 ------------------------- [TM][INFO] ------------------------- step = 100 ------------------------- [TM][INFO] ------------------------- step = 110 ------------------------- [TM][INFO] ------------------------- step = 120 ------------------------- [TM][INFO] ------------------------- step = 130 ------------------------- [TM][INFO] ------------------------- step = 140 ------------------------- [TM][INFO] ------------------------- step = 150 ------------------------- [TM][INFO] ------------------------- step = 160 ------------------------- [TM][INFO] ------------------------- step = 170 ------------------------- [TM][INFO] ------------------------- step = 180 ------------------------- [TM][INFO] ------------------------- step = 190 ------------------------- [TM][INFO] ------------------------- step = 200 ------------------------- [TM][INFO] ------------------------- step = 210 ------------------------- [TM][INFO] ------------------------- step = 220 ------------------------- [TM][INFO] ------------------------- step = 230 ------------------------- [TM][INFO] ------------------------- step = 240 ------------------------- [TM][INFO] ------------------------- step = 250 ------------------------- [TM][INFO] ------------------------- step = 260 ------------------------- [TM][INFO] ------------------------- step = 270 ------------------------- [TM][INFO] ------------------------- step = 280 ------------------------- [TM][INFO] ------------------------- step = 290 ------------------------- [TM][INFO] ------------------------- step = 300 ------------------------- [TM][INFO] ------------------------- step = 310 ------------------------- [TM][INFO] ------------------------- step = 320 ------------------------- [TM][INFO] ------------------------- step = 330 ------------------------- [TM][INFO] ------------------------- step = 340 ------------------------- [TM][INFO] ------------------------- step = 350 ------------------------- [TM][INFO] ------------------------- step = 360 ------------------------- [TM][INFO] ------------------------- step = 370 ------------------------- [TM][INFO] ------------------------- step = 380 ------------------------- [TM][INFO] ------------------------- step = 390 ------------------------- [TM][INFO] ------------------------- step = 400 ------------------------- [TM][INFO] ------------------------- step = 410 ------------------------- [TM][INFO] ------------------------- step = 420 ------------------------- [TM][INFO] ------------------------- step = 430 ------------------------- [TM][INFO] ------------------------- step = 440 ------------------------- [TM][INFO] ------------------------- step = 450 ------------------------- [TM][INFO] ------------------------- step = 460 ------------------------- [TM][INFO] ------------------------- step = 470 ------------------------- [TM][INFO] ------------------------- step = 480 ------------------------- [TM][INFO] ------------------------- step = 490 ------------------------- [TM][INFO] ------------------------- step = 500 ------------------------- [TM][INFO] ------------------------- step = 510 ------------------------- [TM][INFO] ------------------------- step = 520 ------------------------- [TM][INFO] ------------------------- step = 530 ------------------------- [TM][INFO] ------------------------- step = 540 ------------------------- [TM][INFO] ------------------------- step = 550 ------------------------- [TM][INFO] ------------------------- step = 560 ------------------------- [TM][INFO] ------------------------- step = 570 ------------------------- [TM][INFO] ------------------------- step = 580 ------------------------- [TM][INFO] ------------------------- step = 590 ------------------------- [TM][INFO] ------------------------- step = 600 ------------------------- [TM][INFO] ------------------------- step = 610 ------------------------- [TM][INFO] ------------------------- step = 620 ------------------------- [TM][INFO] ------------------------- step = 630 ------------------------- [TM][INFO] ------------------------- step = 640 ------------------------- [TM][INFO] ------------------------- step = 650 ------------------------- [TM][INFO] ------------------------- step = 660 ------------------------- [TM][INFO] ------------------------- step = 670 ------------------------- [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][INFO] [Interrupt] slot = 0, id = 2 [TM][INFO] [forward] Request complete for 2, code 0 [TM][INFO] [forward] Request complete for 2, code 0 ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 265, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 261, in wrap await func() File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/opt/py38/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive await self.message_event.wait() File "/usr/lib/python3.8/asyncio/locks.py", line 309, in wait await fut asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

对应的流式输出信息也会异常: data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"统领"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"郑州"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"出して"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"anonymous"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"毒"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"พุทธ"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"\tse"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"�鹮"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"rowsers"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" sketch"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" too"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" STACK"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" rift"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"CAS"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"-cultural"},"logprobs":null,"finish_reason":null}]}

Vincent131499 commented 3 months ago

用qwen2-7b-instruct来验证,用以上部署命令和curl请求命令,推理结果都是正常的

AllentDan commented 3 months ago

qwen2-1.5b-instruct. 确实跑不了,结果乱码

Vincent131499 commented 3 months ago

qwen2-1.5b-instruct. 确实跑不了,结果乱码

@AllentDan 这个qwen2-1.5b会尽快适配turbomind吗,还有那个moe-14b

lvhan028 commented 3 months ago

turbomind支持moe的模型在计划中, @lzhangzz 会负责。目前还不好给出具体时间表

Vincent131499 commented 3 months ago

turbomind支持moe的模型在计划中, @lzhangzz 会负责。目前还不好给出具体时间表

@lvhan028 Great! 那个qwen2-1.5b会尽快适配解决掉吗

lvhan028 commented 3 months ago

https://github.com/InternLM/lmdeploy/pull/1782 可以看下这个PR,修复了