Open tairen99 opened 4 days ago
I also encountered this problem and hope to get an official answer. how to control the prompt length , set the session_len, and how to set cache_max_entry_count , quant_policy according to the model parameters, so that the model output is not truncated?
This is not a bug.
[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220
The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.
In your example, the number of input tokens is input_tokens=1835
, including the image and prompt tokens.
The requested number of output tokens is max_new_tokens=1024
It indicates that input_tokens + max_new_tokens > session_len
, so the engine will truncate the number of requested output tokens.
This is not a bug.
[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds session_len (2056), request_output_len is truncated to 220
The default session_len is 2056, meaning the max sequence length of a session, including the input and output tokens.
In your example, the number of input tokens is
input_tokens=1835
, including the image and prompt tokens. The requested number of output tokens ismax_new_tokens=1024
It indicates that
input_tokens + max_new_tokens > session_len
, so the engine will truncate the number of requested output tokens.
Hi @lvhan028, @zhyncs, and @AllentDan,
Thank you very much for your quick reply and all your help before.
Even though it was not a bug in this case, I do not know why it came across in the wheel versions lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl
and lmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl
.
If I using pip install lmdeploy
and run the same test code, I get following output without the ERROR information "2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221", see the output for detail from pip install lmdeploy
version:
=======================================
[TM][WARNING] Device 3 peer access Device 0 is not available.
[TM][WARNING] Device 3 peer access Device 1 is not available.
[TM][WARNING] Device 3 peer access Device 2 is not available.
test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png
2024-06-25 17:41:49,486 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-06-25 17:41:49,487 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
/opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/opt/conda/lib/python3.9/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 2.946s
2024-06-25 17:41:52,433 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-06-25 17:41:57,504 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n
=======================================
So I guess there is some minor difference between your previous version and the new version which may generate this result variation.
Can you please double check the difference? or can you please refer me the changes so I make in my local fork and run workflow for myself?
Thank you again!
You may ignore the log, it does not influence the usage. We will change that log level from error to warning.
You may ignore the log, it does not influence the usage. We will change that log level from error to warning. Hi @AllentDan,
Thank you for your quick response.
Yeah, I wanted to ignore the error directly, but for my large and dense chart, the model's outputs are truncated due to the error mentioned earlier.
However, the lmdeploy version installed via pip install lmdeploy provides a complete output without the truncation issue.
If the same input causes truncation in the wheels version, why does it not cause the same error in the pip-installed version?
Thank you.
I see. Seems in the current branch, session_len
of turbomind was affected. Please specify the session_len arg 32768
in your codes. I will fix it ASAP.
I see. Seems in the current branch,
session_len
of turbomind was affected. Please specify the session_len arg32768
in your codes. I will fix it ASAP.
Sure, thanks a lot.
Checklist
Describe the bug
Hi all,
Thank you for your good work!
As suggested from issue, I tried the latest lmdeploy (
lmdeploy-0.4.2+cu121+da439df-cp39-cp39-manylinux2014_x86_64.whl
andlmdeploy-0.4.2+cu118+da439df-cp39-cp39-manylinux2014_x86_64.whl
to get the deterministic output, but I meet the error as below.Beside the error, the results are deterministic but for very dense input images, the results are truncated as the ERROR shown.
However, if I install the lmdeploy using "
pip install lmdeploy
", then, I do not have this error and the results are not truncated even for the dense input images, but the results are NOT deterministic.========================================
[TM][WARNING] Device 2 peer access Device 3 is not available. [TM][WARNING] Device 3 peer access Device 0 is not available. [TM][WARNING] Device 3 peer access Device 1 is not available. [TM][WARNING] Device 3 peer access Device 2 is not available. test image is: Meta_2022_13_0_1659551667_stacked_bar_chart_plus_legend.png 2024-06-24 18:30:26,329 - lmdeploy - INFO - start ImageEncoder._forward_loop 2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images. 2024-06-24 18:30:26,329 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images. 2024-06-24 18:30:34,239 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 7.910s 2024-06-24 18:30:34,240 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images. 2024-06-24 18:30:34,241 - lmdeploy - INFO - prompt='<|im_start|>system\nYou are an AI assistant whose name is InternLM (书生·浦语).<|im_end|>\n<|im_start|>user\n![]()
\nPlease inference this chart into a detailed table<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=1024, top_p=0.8, top_k=40, temperature=0, repetition_penalty=1.0, ignore_eos=False, random_seed=6725412376424003715, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 2770, 657, 589, 15358, 17993, 6843, 963, 505, 4576, 11146, 451, 60628, 60384, 60721, 62442, 60752, 699, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 5658, 43929, 550, 9617, 1263, 395, 11832, 2115, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-06-24 18:30:34,241 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1835, max_new_tokens=1024, seq_start=True, seq_end=True, step=0, prep=True
2024-06-24 18:30:34,241 - lmdeploy - ERROR - Truncate max_new_tokens to 221
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][WARNING] [ProcessInferRequests] [0] total sequence length (1835 + 221) exceeds
session_len
(2056),request_output_len
is truncated to 220 [TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1835, max_q = 1835, max_k = 1835 [TM][INFO] ------------------------- step = 1840 ------------------------- [TM][INFO] ------------------------- step = 1850 ------------------------- [TM][INFO] ------------------------- step = 1860 ------------------------- [TM][INFO] ------------------------- step = 1870 ------------------------- [TM][INFO] ------------------------- step = 1880 ------------------------- [TM][INFO] ------------------------- step = 1890 ------------------------- [TM][INFO] ------------------------- step = 1900 ------------------------- [TM][INFO] ------------------------- step = 1910 ------------------------- [TM][INFO] ------------------------- step = 1920 ------------------------- [TM][INFO] ------------------------- step = 1930 ------------------------- [TM][INFO] ------------------------- step = 1940 ------------------------- [TM][INFO] ------------------------- step = 1950 ------------------------- [TM][INFO] ------------------------- step = 1960 ------------------------- [TM][INFO] ------------------------- step = 1970 ------------------------- [TM][INFO] ------------------------- step = 1980 ------------------------- [TM][INFO] ------------------------- step = 1990 ------------------------- [TM][INFO] ------------------------- step = 2000 ------------------------- [TM][INFO] [Interrupt] slot = 0, id = 0 [TM][INFO] [forward] Request completed for 0 ====> The question is: Please inference this chart into a detailed table========================================
The test input image is:![gettyimages-182495865-2048x2048](https://github.com/InternLM/lmdeploy/assets/32938376/ece4bf69-967a-48cf-812f-c0c9848776a8)
Reproduction
from lmdeploy import pipeline, GenerationConfig from lmdeploy.messages import TurbomindEngineConfig from lmdeploy.vl import load_image
model = 'OpenGVLab/InternVL-Chat-V1-5-AWQ' image = load_image("/app/342455249-ece4bf69-967a-48cf-812f-c0c9848776a8.jpg") backend_config = TurbomindEngineConfig(model_format='awq', tp=4, cache_max_entry_count=0.1) pipe = pipeline(model, backend_config=backend_config, log_level='INFO') gen_config = GenerationConfig(top_p=0.8, top_k=40, temperature=0, max_new_tokens=1024) sel_question = "Please inference this chart into a detailed table" response = pipe((sel_question, image), gen_config=gen_config) print(response.text)
Environment
Error traceback
No response