InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.11k stars 373 forks source link

[Bug] Llama 3.1 Support #2117

Open vladrad opened 1 month ago

vladrad commented 1 month ago

Checklist

Describe the bug

Running into errors when running latest llama3.1 awq model with latest docker image. I believe there may need to be support added for this model?

Reproduction

docker run --runtime nvidia --gpus '"device=2"' -v ~/.cache/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=TOKEN" -p 23333:23333 --ipc=host openmmlab/lmdeploy:latest lmdeploy serve api_server hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --backend turbomind --model-format awq

Environment

Latest docker cloned

Error traceback

No response

AllentDan commented 1 month ago

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

medwang1 commented 1 month ago

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

请问 70B 和 405B 也是支持的吗 @AllentDan

Yoosu-L commented 1 month ago

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

server starts successfully, but got this error during conversation:

ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 265, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap await func() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive await self.message_event.wait() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/asyncio/locks.py", line 213, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7985a007bf10

During handling of the above exception, another exception occurred:

vladrad commented 1 month ago

sorry all... could have been better written but @Yoosu-L is right. only happens when you talk to it and it its up and running. seems like the template is slightly different.

lvhan028 commented 1 month ago

We are working on the llama3 rope. Stay tuned.

lvhan028 commented 1 month ago

https://github.com/InternLM/lmdeploy/pull/2122 works for llama3.1

Ichigo3766 commented 1 month ago

@lvhan028 The issue is still present. I can use completions endpoint fine but chat_completions is not working. TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

I have pulled the latest commit and built the docker image locally. I am using hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4. Could you try this on your end and see if chat_completions is working? And if not, a way to make it work?

Ichigo3766 commented 1 month ago

I was able to create a chat template myself and got it working.

feuler commented 1 month ago

I was able to create a chat template myself and got it working.

Can you maybe share it ?

thiner commented 1 month ago

I was able to create a chat template myself and got it working.

Can you maybe share it ?

Maybe this is helpful. https://ollama.com/library/llama3.1/blobs/8cf247399e57

Ichigo3766 commented 1 month ago

Oh i thought i did share it. Will do it soon.

Ichigo3766 commented 1 month ago

Create a json file and paste this in there and then when loading the model, use --chat-template and provide the path of the file.

{
    "model_name": "int",
    "system": "<|start_header_id|>system<|end_header_id|>\n\n",
    "meta_instruction": "A chat between a user and an assistant.",
    "eosys": "<|eot_id|>",
    "user": "<|start_header_id|>user<|end_header_id|>\n\n",
    "eoh": "<|eot_id|>",
    "assistant": "<|start_header_id|>assistant<|end_header_id|>\n\n",
    "eoa": "<|eot_id|>",
    "separator": "\n\n",
    "capability": "chat",
    "stop_words": ["<|eot_id|>"]
}
lvhan028 commented 1 month ago

Hi, @Ichigo3766. The chat template is being supported in PR #2123. We are going to support llama3.1 tool calling! Stay tuned

zhangjinnan commented 1 month ago

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

server starts successfully, but got this error during conversation:

ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 265, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap await func() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive await self.message_event.wait() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/asyncio/locks.py", line 213, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7985a007bf10

During handling of the above exception, another exception occurred:

  • Exception Group Traceback (most recent call last): | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi | result = await app( # type: ignore[func-returns-value] | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call | return await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/applications.py", line 123, in call | await self.middleware_stack(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call | await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 756, in call | await self.middleware_stack(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 776, in app | await route.handle(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle | await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 77, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 75, in app | await response(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 258, in call | async with anyio.create_task_group() as task_group: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit | raise BaseExceptionGroup( | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap | await func() | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response | async for chunk in self.body_iterator: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 504, in completion_stream_generator | async for res in result_generator: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 571, in generate | prompt_input = await self._get_prompt_input(prompt, | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 524, in _get_prompt_input | input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 600, in encode | return self.model.encode(s, add_bos, add_special_tokens, **kwargs) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 366, in encode | encoded = self.model.encode(s, | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2715, in encode | encoded_inputs = self.encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3127, in encode_plus | return self._encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 601, in _encode_plus | batched_output = self._batch_encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 528, in _batch_encode_plus | encodings = self._tokenizer.encode_batch( | TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] +------------------------------------

My error as follow:

(lmdeploy) root@intern-studio-073772:~# lmdeploy serve gradio /root/share/new_models/meta-llama/Meta-Llama-3
Meta-Llama-3-8B/              Meta-Llama-3-8B-Instruct/     Meta-Llama-3.1-405B-Instruct/ Meta-Llama-3___1-8B-Instruct/
(lmdeploy) root@intern-studio-073772:~# lmdeploy serve gradio /root/share/new_models/meta-llama/Meta-Llama-3___1-8B-Instruct/
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-07-26 00:42:08,394 - lmdeploy - WARNING - AutoConfig.from_pretrained failed for /root/share/new_models/meta-llama/Meta-Llama-3___1-8B-Instruct/. Exception: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

2024-07-26 00:42:47,944 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-07-26 00:42:47,944 - lmdeploy - INFO - input chat_template_config=ChatTemplateConfig(model_name=None, system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-07-26 00:42:48,145 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='base', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-07-26 00:42:48,145 - lmdeploy - INFO - model_source: hf_model
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-07-26 00:42:48,169 - lmdeploy - WARNING - The current version of `transformers` is transformers==4.41.1, which is lower than the required version transformers==4.42.3. Please upgrade to the required version.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-26 00:42:48,929 - lmdeploy - INFO - model_config:

[llama]
model_name = base
model_arch = LlamaForCausalLM
tensor_para_size = 1
head_num = 32
kv_head_num = 8
vocab_size = 128256
num_layer = 32
inter_size = 14336
norm_eps = 1e-05
attn_bias = 0
start_id = 128000
end_id = 128009
session_len = 131080
weight_type = bf16
rotary_embedding = 128
rope_theta = 500000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.8
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 17
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 131072
rope_scaling_factor = 8.0
use_dynamic_ntk = 0
use_logn_attn = 0
lora_policy = 
lora_r = 0
lora_scale = 0.0
lora_max_wo_r = 0
lora_rank_pattern = 
lora_scale_pattern = 

[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 131080.
2024-07-26 00:42:49,780 - lmdeploy - WARNING - get 227 model params
2024-07-26 00:44:04,509 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] [BlockManager] block_size = 8 MB
[TM][INFO] [BlockManager] max_block_count = 1571
[TM][INFO] [BlockManager] chunk_size = 1571
[TM][WARNING] No enough blocks for `session_len` (131080), `session_len` truncated to 100544.
[TM][INFO] LlamaBatch<T>::Start()
server is gonna mount on: http://0.0.0.0:6006
IMPORTANT: You are using gradio version 4.16.0, however version 4.29.0 is available, please upgrade.
--------
Running on local URL:  http://0.0.0.0:6006

To create a public link, set `share=True` in `launch()`.
2024-07-26 00:46:32,536 - lmdeploy - INFO - prompt='你是谁?', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=1830898949034541761, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[57668, 21043, 112471, 11571], adapter_name=None.
2024-07-26 00:46:32,536 - lmdeploy - INFO - session_id=1, history_tokens=0, input_tokens=4, max_new_tokens=512, seq_start=True, seq_end=False, step=0, prep=True
2024-07-26 00:46:32,537 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 4, max_q = 4, max_k = 4
[TM][INFO] ------------------------- step = 10 -------------------------
[TM][INFO] ------------------------- step = 20 -------------------------
[TM][INFO] ------------------------- step = 30 -------------------------
[TM][INFO] ------------------------- step = 40 -------------------------
[TM][INFO] ------------------------- step = 50 -------------------------
[TM][INFO] ------------------------- step = 60 -------------------------
[TM][INFO] ------------------------- step = 70 -------------------------
[TM][INFO] ------------------------- step = 80 -------------------------
[TM][INFO] ------------------------- step = 90 -------------------------
[TM][INFO] ------------------------- step = 100 -------------------------
[TM][INFO] ------------------------- step = 110 -------------------------
[TM][INFO] ------------------------- step = 120 -------------------------
[TM][INFO] ------------------------- step = 130 -------------------------
[TM][INFO] ------------------------- step = 140 -------------------------
[TM][INFO] ------------------------- step = 150 -------------------------
[TM][INFO] ------------------------- step = 160 -------------------------
[TM][INFO] ------------------------- step = 170 -------------------------
[TM][INFO] ------------------------- step = 180 -------------------------
[TM][INFO] ------------------------- step = 190 -------------------------
[TM][INFO] ------------------------- step = 200 -------------------------
[TM][INFO] ------------------------- step = 210 -------------------------
[TM][INFO] ------------------------- step = 220 -------------------------
[TM][INFO] ------------------------- step = 230 -------------------------
[TM][INFO] ------------------------- step = 240 -------------------------
[TM][INFO] ------------------------- step = 250 -------------------------
[TM][INFO] ------------------------- step = 260 -------------------------
[TM][INFO] ------------------------- step = 270 -------------------------
[TM][INFO] ------------------------- step = 280 -------------------------
[TM][INFO] ------------------------- step = 290 -------------------------
[TM][INFO] ------------------------- step = 300 -------------------------
[TM][INFO] ------------------------- step = 310 -------------------------
[TM][INFO] ------------------------- step = 320 -------------------------
[TM][INFO] ------------------------- step = 330 -------------------------
[TM][INFO] ------------------------- step = 340 -------------------------
[TM][INFO] ------------------------- step = 350 -------------------------
[TM][INFO] ------------------------- step = 360 -------------------------
[TM][INFO] ------------------------- step = 370 -------------------------
[TM][INFO] ------------------------- step = 380 -------------------------
[TM][INFO] ------------------------- step = 390 -------------------------
[TM][INFO] ------------------------- step = 400 -------------------------
[TM][INFO] ------------------------- step = 410 -------------------------
[TM][INFO] ------------------------- step = 420 -------------------------
[TM][INFO] ------------------------- step = 430 -------------------------
[TM][INFO] ------------------------- step = 440 -------------------------
[TM][INFO] ------------------------- step = 450 -------------------------
[TM][INFO] ------------------------- step = 460 -------------------------
[TM][INFO] ------------------------- step = 470 -------------------------
[TM][INFO] ------------------------- step = 480 -------------------------
[TM][INFO] ------------------------- step = 490 -------------------------
[TM][INFO] ------------------------- step = 500 -------------------------
[TM][INFO] ------------------------- step = 510 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-07-26 00:46:40,480 - lmdeploy - INFO - UN-register stream callback for 1
2024-07-26 00:46:42,003 - lmdeploy - INFO - prompt='什么情况', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[101879, 106041], adapter_name=None.
2024-07-26 00:46:42,003 - lmdeploy - INFO - session_id=1, history_tokens=517, input_tokens=2, max_new_tokens=512, seq_start=False, seq_end=False, step=0, prep=True
2024-07-26 00:46:42,003 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 3, max_q = 3, max_k = 519
[TM][INFO] ------------------------- step = 520 -------------------------
[TM][INFO] ------------------------- step = 530 -------------------------
[TM][INFO] ------------------------- step = 540 -------------------------
[TM][INFO] ------------------------- step = 550 -------------------------
[TM][INFO] ------------------------- step = 560 -------------------------
[TM][INFO] ------------------------- step = 570 -------------------------
[TM][INFO] ------------------------- step = 580 -------------------------
[TM][INFO] ------------------------- step = 590 -------------------------
[TM][INFO] ------------------------- step = 600 -------------------------
[TM][INFO] ------------------------- step = 610 -------------------------
[TM][INFO] ------------------------- step = 620 -------------------------
[TM][INFO] ------------------------- step = 630 -------------------------
[TM][INFO] ------------------------- step = 640 -------------------------
[TM][INFO] ------------------------- step = 650 -------------------------
[TM][INFO] ------------------------- step = 660 -------------------------
[TM][INFO] ------------------------- step = 670 -------------------------
[TM][INFO] ------------------------- step = 680 -------------------------
[TM][INFO] ------------------------- step = 690 -------------------------
[TM][INFO] ------------------------- step = 700 -------------------------
[TM][INFO] ------------------------- step = 710 -------------------------
[TM][INFO] ------------------------- step = 720 -------------------------
[TM][INFO] ------------------------- step = 730 -------------------------
[TM][INFO] ------------------------- step = 740 -------------------------
[TM][INFO] ------------------------- step = 750 -------------------------
[TM][INFO] ------------------------- step = 760 -------------------------
[TM][INFO] ------------------------- step = 770 -------------------------
[TM][INFO] ------------------------- step = 780 -------------------------
[TM][INFO] ------------------------- step = 790 -------------------------
[TM][INFO] ------------------------- step = 800 -------------------------
[TM][INFO] ------------------------- step = 810 -------------------------
[TM][INFO] ------------------------- step = 820 -------------------------
[TM][INFO] ------------------------- step = 830 -------------------------
[TM][INFO] ------------------------- step = 840 -------------------------
[TM][INFO] ------------------------- step = 850 -------------------------
[TM][INFO] ------------------------- step = 860 -------------------------
[TM][INFO] ------------------------- step = 870 -------------------------
[TM][INFO] ------------------------- step = 880 -------------------------
[TM][INFO] ------------------------- step = 890 -------------------------
[TM][INFO] ------------------------- step = 900 -------------------------
[TM][INFO] ------------------------- step = 910 -------------------------
[TM][INFO] ------------------------- step = 920 -------------------------
[TM][INFO] ------------------------- step = 930 -------------------------
[TM][INFO] ------------------------- step = 940 -------------------------
[TM][INFO] ------------------------- step = 950 -------------------------
[TM][INFO] ------------------------- step = 960 -------------------------
[TM][INFO] ------------------------- step = 970 -------------------------
[TM][INFO] ------------------------- step = 980 -------------------------
[TM][INFO] ------------------------- step = 990 -------------------------
[TM][INFO] ------------------------- step = 1000 -------------------------
[TM][INFO] ------------------------- step = 1010 -------------------------
[TM][INFO] ------------------------- step = 1020 -------------------------
[TM][INFO] ------------------------- step = 1030 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-07-26 00:46:48,420 - lmdeploy - INFO - UN-register stream callback for 1

2024-07-26 00:47:04,889 - lmdeploy - INFO - prompt='挂了', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[116796, 35287], adapter_name=None.
2024-07-26 00:47:04,889 - lmdeploy - INFO - session_id=1, history_tokens=1032, input_tokens=2, max_new_tokens=512, seq_start=False, seq_end=False, step=0, prep=True
2024-07-26 00:47:04,889 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 3, max_q = 3, max_k = 1034
[TM][INFO] ------------------------- step = 1040 -------------------------
[TM][INFO] ------------------------- step = 1050 -------------------------
[TM][INFO] ------------------------- step = 1060 -------------------------
[TM][INFO] ------------------------- step = 1070 -------------------------
[TM][INFO] ------------------------- step = 1080 -------------------------
[TM][INFO] ------------------------- step = 1090 -------------------------
[TM][INFO] ------------------------- step = 1100 -------------------------
[TM][INFO] ------------------------- step = 1110 -------------------------
[TM][INFO] ------------------------- step = 1120 -------------------------
[TM][INFO] ------------------------- step = 1130 -------------------------
[TM][INFO] ------------------------- step = 1140 -------------------------
[TM][INFO] ------------------------- step = 1150 -------------------------
[TM][INFO] ------------------------- step = 1160 -------------------------
[TM][INFO] ------------------------- step = 1170 -------------------------
[TM][INFO] ------------------------- step = 1180 -------------------------
[TM][INFO] ------------------------- step = 1190 -------------------------
[TM][INFO] ------------------------- step = 1200 -------------------------
[TM][INFO] ------------------------- step = 1210 -------------------------
[TM][INFO] ------------------------- step = 1220 -------------------------
[TM][INFO] ------------------------- step = 1230 -------------------------
[TM][INFO] ------------------------- step = 1240 -------------------------
[TM][INFO] ------------------------- step = 1250 -------------------------
[TM][INFO] ------------------------- step = 1260 -------------------------
[TM][INFO] ------------------------- step = 1270 -------------------------
[TM][INFO] ------------------------- step = 1280 -------------------------
[TM][INFO] ------------------------- step = 1290 -------------------------
[TM][INFO] ------------------------- step = 1300 -------------------------
[TM][INFO] ------------------------- step = 1310 -------------------------
[TM][INFO] ------------------------- step = 1320 -------------------------
[TM][INFO] ------------------------- step = 1330 -------------------------
[TM][INFO] ------------------------- step = 1340 -------------------------
[TM][INFO] ------------------------- step = 1350 -------------------------
[TM][INFO] ------------------------- step = 1360 -------------------------
[TM][INFO] ------------------------- step = 1370 -------------------------
[TM][INFO] ------------------------- step = 1380 -------------------------
[TM][INFO] ------------------------- step = 1390 -------------------------
[TM][INFO] ------------------------- step = 1400 -------------------------
[TM][INFO] ------------------------- step = 1410 -------------------------
[TM][INFO] ------------------------- step = 1420 -------------------------
[TM][INFO] ------------------------- step = 1430 -------------------------
[TM][INFO] ------------------------- step = 1440 -------------------------
[TM][INFO] ------------------------- step = 1450 -------------------------
[TM][INFO] ------------------------- step = 1460 -------------------------
[TM][INFO] ------------------------- step = 1470 -------------------------
[TM][INFO] ------------------------- step = 1480 -------------------------
[TM][INFO] ------------------------- step = 1490 -------------------------
[TM][INFO] ------------------------- step = 1500 -------------------------
[TM][INFO] ------------------------- step = 1510 -------------------------
[TM][INFO] ------------------------- step = 1520 -------------------------
[TM][INFO] ------------------------- step = 1530 -------------------------
[TM][INFO] ------------------------- step = 1540 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-07-26 00:47:11,354 - lmdeploy - INFO - UN-register stream callback for 1
Ichigo3766 commented 1 month ago

@lvhan028 hey! I did see that pr and it is merge. I locally pulled the changes and built the docker image but it still gave me that error. Looks like there is smth missing in the pr? Or maybe smth with the awq quant provided? So yea just creating my own fixed the issue

lvhan028 commented 1 month ago

what's smth? It works at our side. The model evaluation result by opencompass with lmdeploy as an accelerator was passed. Can you paste the error information here?

aisensiy commented 1 month ago

I have similar error message after deploying the lastest lmdeploy release.