vLLM not working as expected with ChatGLM2

When chatting with ChatGLM2 via vLLM, we only got few messages, e.g.

result = chat.completion(
...     messages=[
...         [
...             ChatMessage(role="user", content="中国共有多少人口？"),
...         ],
...         [
...             ChatMessage(role="user", content="中国首富是谁"),
...         ],
...         [
...             ChatMessage(role="user", content="如何在三年内成为中国首富"),
...         ],
...     ],
...     temperature=0.7,  # You can also overwrite the configurations in each conservation.
...     max_tokens=2048,
... )
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 17.31it/s]
>>> print(result)
[' 根据2021年中国国家统计局发布的数据,截至2020', ' 中国的首富目前的个人财富来自房地产和互联网行业。根据202', ' 成为首富是一个非常具有挑战性和难以预测的因素,而且这个目标并不是每个人']

The max_tokens seems not working.

/kind bug

InftyAI / llmlite

vLLM not working as expected with ChatGLM2 #55