InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.11k stars 280 forks source link

[Bug] lmdeploy - [31mERROR[0m - Truncate max_new_tokens to 221 #1841

Open tairen99 opened 4 days ago

tairen99 commented 4 days ago

Checklist

Describe the bug

Hi all,

Thank you for your good work!

As suggested from issue, I tried the latest lmdeploy (lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl and lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl to get the deterministic output, but I meet the error as below.

Beside the error, the results are deterministic but for very dense input images, the results are truncated as the ERROR shown.

However, if I install the lmdeploy using "pip install lmdeploy", then, I do not have this error and the results are not truncated even for the dense input images, but the results are NOT deterministic.

========================================

[TM][WARNING] Device 2 peer access Device 3 is not available. [TM][WARNING] Device 3 peer access Device 0 is not available. [TM][WARNING] Device 3 peer access Device 1 is not available. [TM][WARNING] Device 3 peer access Device 2 is not available. test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png 2024-06-24 18:30:26,329 - lmdeploy - INFO - start ImageEncoder._forward_loop 2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images. 2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images. 2024-06-24 18:30:34,239 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 7.910s 2024-06-24 18:30:34,240 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images. 2024-06-24 18:30:34,241 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=6725412376424003715, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None. 2024-06-24 18:30:34,241 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True 2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221 [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][INFO] [ProcessInferRequests] Request for 0 received. [TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220 [TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835 [TM][INFO] ------------------------- step = 1840 ------------------------- [TM][INFO] ------------------------- step = 1850 ------------------------- [TM][INFO] ------------------------- step = 1860 ------------------------- [TM][INFO] ------------------------- step = 1870 ------------------------- [TM][INFO] ------------------------- step = 1880 ------------------------- [TM][INFO] ------------------------- step = 1890 ------------------------- [TM][INFO] ------------------------- step = 1900 ------------------------- [TM][INFO] ------------------------- step = 1910 ------------------------- [TM][INFO] ------------------------- step = 1920 ------------------------- [TM][INFO] ------------------------- step = 1930 ------------------------- [TM][INFO] ------------------------- step = 1940 ------------------------- [TM][INFO] ------------------------- step = 1950 ------------------------- [TM][INFO] ------------------------- step = 1960 ------------------------- [TM][INFO] ------------------------- step = 1970 ------------------------- [TM][INFO] ------------------------- step = 1980 ------------------------- [TM][INFO] ------------------------- step = 1990 ------------------------- [TM][INFO] ------------------------- step = 2000 ------------------------- [TM][INFO] [Interrupt] slot = 0, id = 0 [TM][INFO] [forward] Request completed for 0 ====> The question is: Please inference this chart into a detailed table

========================================

The test input image is: gettyimages-182495865-2048x2048

Reproduction

from lmdeploy import pipeline, GenerationConfig from lmdeploy.messages import TurbomindEngineConfig from lmdeploy.vl import load_image

model = 'OpenGVLab/InternVL-Chat-V1-5-AWQ' image = load_image("/app/342455249-ece4bf69-967a-48cf-812f-c0c9848776a8.jpg") backend_config = TurbomindEngineConfig(model_format='awq', tp=4, cache_max_entry_count=0.1) pipe = pipeline(model, backend_config=backend_config, log_level='INFO') gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0, max_new_tokens=1024) sel_question = "Please inference this chart into a detailed table" response = pipe((sel_question, image), gen_config=gen_config) print(response.text)

Environment

Server:  4 NVIDIA Tesla T4 GPUs, each has 16 GB GPU memory
Memory: 191 GB
Number of CPUs: 48
Docker Environment: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
Python version: 3.9.19

Error traceback

No response

RayTang88 commented 4 days ago

I also encountered this problem and hope to get an official answer. how to control the prompt length , set the session_len, and how to set cache_max_entry_count , quant_policy according to the model parameters, so that the model output is not truncated?

lvhan028 commented 4 days ago

This is not a bug.

[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220

The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.

In your example, the number of input tokens is input_tokens=1835, including the image and prompt tokens. The requested number of output tokens is max_new_tokens=1024

It indicates that input_tokens + max_new_tokens > session_len, so the engine will truncate the number of requested output tokens.

tairen99 commented 3 days ago

This is not a bug.

[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220

The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.

In your example, the number of input tokens is input_tokens=1835, including the image and prompt tokens. The requested number of output tokens is max_new_tokens=1024

It indicates that input_tokens + max_new_tokens > session_len, so the engine will truncate the number of requested output tokens.

Hi @lvhan028, @zhyncs, and @AllentDan,

Thank you very much for your quick reply and all your help before.

Even though it was not a bug in this case, I do not know why it came across in the wheel versions lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whland lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl.

If I using pip install lmdeploy and run the same test code, I get following output without the ERROR information "2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221", see the output for detail from pip install lmdeploy version:

=======================================

[TM][WARNING] Device 3 peer access Device 0 is not available. [TM][WARNING] Device 3 peer access Device 1 is not available. [TM][WARNING] Device 3 peer access Device 2 is not available. test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png 2024-06-25 17:41:49,486 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images. 2024-06-25 17:41:49,487 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images. /opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 2.946s 2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images. 2024-06-25 17:41:57,504 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=15886905969490819590, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None. 2024-06-25 17:41:57,504 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][INFO] Set logger level by INFO [TM][WARNING] [ProcessInferRequests] Request for 0 received. [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] Set logger level by INFO [TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835 [TM][INFO] Set logger level by INFO [TM][INFO] ------------------------- step = 1840 ------------------------- [TM][INFO] ------------------------- step = 1850 ------------------------- [TM][INFO] ------------------------- step = 1860 ------------------------- [TM][INFO] ------------------------- step = 1870 ------------------------- [TM][INFO] ------------------------- step = 1880 ------------------------- [TM][INFO] ------------------------- step = 1890 ------------------------- [TM][INFO] ------------------------- step = 1900 ------------------------- [TM][INFO] ------------------------- step = 1910 ------------------------- [TM][INFO] ------------------------- step = 1920 ------------------------- [TM][INFO] ------------------------- step = 1930 ------------------------- [TM][INFO] ------------------------- step = 1940 ------------------------- [TM][INFO] ------------------------- step = 1950 ------------------------- [TM][INFO] ------------------------- step = 1960 ------------------------- [TM][INFO] ------------------------- step = 1970 ------------------------- [TM][INFO] ------------------------- step = 1980 ------------------------- [TM][INFO] ------------------------- step = 1990 ------------------------- [TM][INFO] ------------------------- step = 2000 ------------------------- [TM][INFO] ------------------------- step = 2010 ------------------------- [TM][INFO] ------------------------- step = 2020 ------------------------- [TM][INFO] [Interrupt] slot = 0, id = 0 [TM][INFO] [forward] Request complete for 0, code 0 ====> The question is: Please inference this chart into a detailed table

=======================================

So I guess there is some minor difference between your previous version and the new version which may generate this result variation.

Can you please double check the difference? or can you please refer me the changes so I make in my local fork and run workflow for myself?

Thank you again!

AllentDan commented 3 days ago

You may ignore the log, it does not influence the usage. We will change that log level from error to warning.

tairen99 commented 3 days ago

You may ignore the log, it does not influence the usage. We will change that log level from error to warning. Hi @AllentDan,

Thank you for your quick response.

Yeah, I wanted to ignore the error directly, but for my large and dense chart, the model's outputs are truncated due to the error mentioned earlier.

However, the lmdeploy version installed via pip install lmdeploy provides a complete output without the truncation issue.

If the same input causes truncation in the wheels version, why does it not cause the same error in the pip-installed version?

Thank you.

AllentDan commented 3 days ago

I see. Seems in the current branch, session_len of turbomind was affected. Please specify the session_len arg 32768 in your codes. I will fix it ASAP.

tairen99 commented 3 days ago

I see. Seems in the current branch, session_len of turbomind was affected. Please specify the session_len arg 32768 in your codes. I will fix it ASAP.

Sure, thanks a lot.