[Feature] qwen2系列模型

Vincent131499 commented 3 months ago

Motivation

目前看qwen1.5系列是支持的，包含1.8b-110b；请问对于推出的qwen2系列也都支持吗，包括qwen2-1.5b 7b moe-14b 72b等；

Related resources

No response

Additional context

No response

lvhan028 commented 3 months ago

是支持的。

Vincent131499 commented 3 months ago

我验证了qwen2-1.5b-instruct. 部署镜像：v0.4.2版本部署服务命令： lmdeploy serve api_server ../pretrained-models/qwen2-1.5b-instruct/ --log-level INFO --backend turbomind --model-format hf --model-name qwen --server-port 23334 --tp 1 --session-len 16384 --max-batch-size 4 --quant-policy 8 --cache-max-entry-count 0.8 --enable-prefix-caching

curl请求命令： curl http://0.0.0.0:23334/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "qwen", "stream": false, "top_p": 0.95, "top_k": 40, "temperature": 0.2, "repetition_penalty": 1.2, "messages": [ {"role": "user", "content": "你叫什么名字"} ]}'

在服务端会报如下bug: 2024-06-14 05:38:45,414 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n你叫什么名字<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=16361, top_p=0.95, top_k=40, temperature=0.2, repetition_penalty=1.2, ignore_eos=False, random_seed=13484932582757945445, stop_words=[151645], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 56568, 99882, 99245, 101419, 151645, 198, 151644, 77091, 198], adapter_name=None. 2024-06-14 05:38:45,414 - lmdeploy - INFO - session_id=2, history_tokens=0, input_tokens=23, max_new_tokens=16361, seq_start=True, seq_end=True, step=0, prep=True [TM][INFO] Set logger level by INFO [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][WARNING] [ProcessInferRequests] Request for 2 received. [TM][WARNING] [ProcessInferRequests] [2] total sequence length (23 + 16361) exceeds session_len (16384), request_output_len is truncated to 16360 [TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 23, max_q = 23, max_k = 23 [TM][INFO] ------------------------- step = 30 ------------------------- [TM][INFO] ------------------------- step = 40 ------------------------- [TM][INFO] ------------------------- step = 50 ------------------------- [TM][INFO] ------------------------- step = 60 ------------------------- [TM][INFO] ------------------------- step = 70 ------------------------- [TM][INFO] ------------------------- step = 80 ------------------------- [TM][INFO] ------------------------- step = 90 ------------------------- [TM][INFO] ------------------------- step = 100 ------------------------- [TM][INFO] ------------------------- step = 110 ------------------------- [TM][INFO] ------------------------- step = 120 ------------------------- [TM][INFO] ------------------------- step = 130 ------------------------- [TM][INFO] ------------------------- step = 140 ------------------------- [TM][INFO] ------------------------- step = 150 ------------------------- [TM][INFO] ------------------------- step = 160 ------------------------- [TM][INFO] ------------------------- step = 170 ------------------------- [TM][INFO] ------------------------- step = 180 ------------------------- [TM][INFO] ------------------------- step = 190 ------------------------- [TM][INFO] ------------------------- step = 200 ------------------------- [TM][INFO] ------------------------- step = 210 ------------------------- [TM][INFO] ------------------------- step = 220 ------------------------- [TM][INFO] ------------------------- step = 230 ------------------------- [TM][INFO] ------------------------- step = 240 ------------------------- [TM][INFO] ------------------------- step = 250 ------------------------- [TM][INFO] ------------------------- step = 260 ------------------------- [TM][INFO] ------------------------- step = 270 ------------------------- [TM][INFO] ------------------------- step = 280 ------------------------- [TM][INFO] ------------------------- step = 290 ------------------------- [TM][INFO] ------------------------- step = 300 ------------------------- [TM][INFO] ------------------------- step = 310 ------------------------- [TM][INFO] ------------------------- step = 320 ------------------------- [TM][INFO] ------------------------- step = 330 ------------------------- [TM][INFO] ------------------------- step = 340 ------------------------- [TM][INFO] ------------------------- step = 350 ------------------------- [TM][INFO] ------------------------- step = 360 ------------------------- [TM][INFO] ------------------------- step = 370 ------------------------- [TM][INFO] ------------------------- step = 380 ------------------------- [TM][INFO] ------------------------- step = 390 ------------------------- [TM][INFO] ------------------------- step = 400 ------------------------- [TM][INFO] ------------------------- step = 410 ------------------------- [TM][INFO] ------------------------- step = 420 ------------------------- [TM][INFO] ------------------------- step = 430 ------------------------- [TM][INFO] ------------------------- step = 440 ------------------------- [TM][INFO] ------------------------- step = 450 ------------------------- [TM][INFO] ------------------------- step = 460 ------------------------- [TM][INFO] ------------------------- step = 470 ------------------------- [TM][INFO] ------------------------- step = 480 ------------------------- [TM][INFO] ------------------------- step = 490 ------------------------- [TM][INFO] ------------------------- step = 500 ------------------------- [TM][INFO] ------------------------- step = 510 ------------------------- [TM][INFO] ------------------------- step = 520 ------------------------- [TM][INFO] ------------------------- step = 530 ------------------------- [TM][INFO] ------------------------- step = 540 ------------------------- [TM][INFO] ------------------------- step = 550 ------------------------- [TM][INFO] ------------------------- step = 560 ------------------------- [TM][INFO] ------------------------- step = 570 ------------------------- [TM][INFO] ------------------------- step = 580 ------------------------- [TM][INFO] ------------------------- step = 590 ------------------------- [TM][INFO] ------------------------- step = 600 ------------------------- [TM][INFO] ------------------------- step = 610 ------------------------- [TM][INFO] ------------------------- step = 620 ------------------------- [TM][INFO] ------------------------- step = 630 ------------------------- [TM][INFO] ------------------------- step = 640 ------------------------- [TM][INFO] ------------------------- step = 650 ------------------------- [TM][INFO] ------------------------- step = 660 ------------------------- [TM][INFO] ------------------------- step = 670 ------------------------- [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][INFO] [Interrupt] slot = 0, id = 2 [TM][INFO] [forward] Request complete for 2, code 0 [TM][INFO] [forward] Request complete for 2, code 0 ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 265, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 261, in wrap await func() File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/opt/py38/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive await self.message_event.wait() File "/usr/lib/python3.8/asyncio/locks.py", line 309, in wait await fut asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last): | File "/opt/py38/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi | result = await app( # type: ignore[func-returns-value] | File "/opt/py38/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call | return await self.app(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/applications.py", line 123, in call | await self.middleware_stack(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/opt/py38/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/opt/py38/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in call | await self.app(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/opt/py38/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/opt/py38/lib/python3.8/site-packages/starlette/routing.py", line 756, in call | await self.middleware_stack(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/routing.py", line 776, in app | await route.handle(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle | await self.app(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/routing.py", line 77, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/opt/py38/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/opt/py38/lib/python3.8/site-packages/starlette/routing.py", line 75, in app | await response(scope, receive, send) | File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 265, in call | await wrap(partial(self.listen_for_disconnect, receive)) | File "/opt/py38/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit | raise BaseExceptionGroup( | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 261, in wrap | await func() | File "/opt/py38/lib/python3.8/site-packages/starlette/responses.py", line 250, in stream_response | async for chunk in self.body_iterator: | File "/opt/lmdeploy/lmdeploy/serve/openai/api_server.py", line 464, in completion_stream_generator | async for res in result_generator: | File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 633, in generate | response, state = self.tokenizer.detokenize_incrementally( | File "/opt/lmdeploy/lmdeploy/tokenizer.py", line 569, in detokenize_incrementally | return self.model.detokenize_incrementally( | File "/opt/lmdeploy/lmdeploy/tokenizer.py", line 448, in detokenize_incrementally | new_text = self._convert_tokens_to_string_with_added_encoders( | File "/opt/lmdeploy/lmdeploy/tokenizer.py", line 373, in _convert_tokens_to_string_with_added_encoders | return tokenizer.convert_tokens_to_string(output_tokens) | File "/opt/py38/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 619, in convert_tokens_to_string | return self.backend_tokenizer.decoder.decode(tokens) | TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString' +------------------------------------

对应的流式输出信息也会异常： data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"统领"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"郑州"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"出して"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"anonymous"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"毒"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"พุทธ"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"\tse"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"�鹮"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"rowsers"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" sketch"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" too"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" STACK"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":" rift"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"CAS"},"logprobs":null,"finish_reason":null}]}

data: {"id":"2","object":"chat.completion.chunk","created":1718343525,"model":"qwen","choices":[{"index":0,"delta":{"role":"assistant","content":"-cultural"},"logprobs":null,"finish_reason":null}]}

Vincent131499 commented 3 months ago

用qwen2-7b-instruct来验证，用以上部署命令和curl请求命令，推理结果都是正常的

AllentDan commented 3 months ago

qwen2-1.5b-instruct. 确实跑不了，结果乱码

Vincent131499 commented 3 months ago

qwen2-1.5b-instruct. 确实跑不了，结果乱码

@AllentDan 这个qwen2-1.5b会尽快适配turbomind吗，还有那个moe-14b

lvhan028 commented 3 months ago

turbomind支持moe的模型在计划中, @lzhangzz 会负责。目前还不好给出具体时间表

Vincent131499 commented 3 months ago

turbomind支持moe的模型在计划中, @lzhangzz 会负责。目前还不好给出具体时间表

@lvhan028 Great! 那个qwen2-1.5b会尽快适配解决掉吗

lvhan028 commented 3 months ago

https://github.com/InternLM/lmdeploy/pull/1782 可以看下这个PR，修复了

InternLM / lmdeploy