Closed Hzzone closed 5 months ago
If you start the server with --session-len
, you can't pass max_tokens
the same value as session_len
.
Because there is logic: if the length of input token plus max_tokens is greater than session_len, the engine will just give empty output with finish_reason='length'.
While, if you don't pass the max_tokens, the potential output length should be session minus history context length.
Thanks for your answer.
Checklist
Describe the bug
The latest lmdeploy 0.4.1 The bug is described as the title. Empty string once setting max_tokens with finish_reason='length'. Works well without max_tokens. More details are shown in code.
Reproduction
Run the server as
Reproduce code:
model_name = 'liuhaotian/llava-v1.6-34b' client = llm_clients[model_name] response = client.chat.completions.create( model=client.models.list().data[0].id, messages=[{'role': 'user', 'content': [ {'type': 'text', 'text': 'who are you'}, ]}],
max_tokens=8192,
) response
Environment
Error traceback
No response