[BUG] vllm docker部署模型对话完成后无法自动结束，报错：<|endoftext|><|im_end|> <|im_start|>

huangshengfu commented 5 months ago

聊天对话回答完问题后无法结束聊天一直在重复打印<|im_start|>

docker 运行参数命令： docker run --runtime nvidia --gpus all --name vllm-qwen14b -v ~/.cache/huggingface:/root/.cache/huggingface \ -v /data1/Download/models/Qwen-14B-Chat-Int4:/data/shared/Qwen/Qwen-Chat -p 8901:8000 --ipc=host \ vllm/vllm-openai:latest --model /data/shared/Qwen/Qwen-Chat --max-model-len 2048 --trust-remote-code \ --tensor-parallel-size 2 --gpu-memory-utilization 0.7 --api-key "xxxxx"

应该设置什么参数才能实现？stop_token_ids，我不知道该如何传？

预期的结果 / Expected Result 希望回答完毕后正常结束

实际结果 / Actual Result 如上图所示

环境信息 / Environment Information

是否使用 Docker 部署（是/否）：是
使用的模型（ChatGLM2-6B / Qwen-7B 等）： Qwen-72B,

附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.

dosubot[bot] commented 5 months ago

To address the issue where your Docker-deployed model keeps repeating <im_start> after completing a conversation, you should consider adjusting the stop parameter in the LLMSingleActionAgent initialization to include the stop_token_ids. This involves specifying tokens at which the model should cease generating further output. If <im_start> is the token indicating an unwanted repetition start, you can add it to the stop list like so:

agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=output_parser,
    stop=["\nObservation:", "Observation", "<im_start>"],  # Adding "<im_start>" to the list
    allowed_tools=tool_names,
)

This modification instructs the LLMSingleActionAgent to halt further output generation upon encountering any token listed in stop, including the newly added <im_start> token.

This solution is based on the assumption that you have access to and can modify the initialization of LLMSingleActionAgent within your deployment setup. If you're looking for how to pass stop_token_ids directly via Docker command parameters, it's important to note that such functionality might not be directly supported through Docker run commands without modifying the underlying application code to accept and process these parameters.

If you're encountering specific errors when running your Docker command or if there are other methods you've tried without success, providing more details could help in offering a more tailored solution.

#### Details

**Sources** - [server/chat/agent_chat.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/chat/agent_chat.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

huangshengfu commented 4 months ago

请问下这段配置应该加到哪个文件中？

zRzRzRzRzRzRzR commented 4 months ago

vllm 你应该自己设置 special token吧

chatchat-space / Langchain-Chatchat

[BUG] vllm docker部署模型对话完成后无法自动结束，报错：<|endoftext|><|im_end|> <|im_start|> #3834