Open kerthcet opened 6 months ago
When chatting with ChatGLM2 via vLLM, we only got few messages, e.g.
result = chat.completion( ... messages=[ ... [ ... ChatMessage(role="user", content="中国共有多少人口?"), ... ], ... [ ... ChatMessage(role="user", content="中国首富是谁"), ... ], ... [ ... ChatMessage(role="user", content="如何在三年内成为中国首富"), ... ], ... ], ... temperature=0.7, # You can also overwrite the configurations in each conservation. ... max_tokens=2048, ... ) Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 17.31it/s] >>> print(result) [' 根据2021年中国国家统计局发布的数据,截至2020', ' 中国的首富目前的个人财富来自房地产和互联网行业。根据202', ' 成为首富是一个非常具有挑战性和难以预测的因素,而且这个目标并不是每个人']
The max_tokens seems not working.
max_tokens
/kind bug
When chatting with ChatGLM2 via vLLM, we only got few messages, e.g.
The
max_tokens
seems not working./kind bug