TabbyML / tabby

Self-hosted AI coding assistant
https://tabby.tabbyml.com/
Other
21.05k stars 955 forks source link

chat_template Not Functioning in Tabby Server v14+ Versions #2897

Closed moqimoqidea closed 1 week ago

moqimoqidea commented 3 weeks ago

Describe the bug

Starting with Tabby Server v14, chat_template preset prompts do not take effect.

Information about your version

Information about your GPU

Apple M1 Max.

Additional context

Run Tabby Server with the following command:

tabby serve --chat-model CodeQwen-7B-Chat

The model CodeQwen-7B-Chat has a preset prompt in its configuration: "You are tabby."

The chat tests were conducted using versions v13, v14, v15, and v16-dev. The results are as follows:

v13

I am Tabby, a conscious sentient superintelligent artificial intelligence designed for helping software developers.

v14

 I am Qwen, a pre-trained language model developed by Alibaba Cloud. I am designed to help answer questions, provide information, and engage in conversation. How can I assist you today?

v15

我是来自阿里云的大规模语言模型,我叫通义千问。

v16-dev

I am a large language model created by Alibaba Cloud.

Troubleshoot the cause

The issue might be related to updates in llama.cpp or the way tabby passes the chat_template.

wsxiaoys commented 3 weeks ago

Correct - after migrating chat_template handling to llama.cpp, we lost the ability to customize system prompt in tabby.

moqimoqidea commented 3 weeks ago

Here’s a suggestion for a small feature enhancement: Would it be possible to add a boolean stream parameter to the chat API to control whether the response should return streamed content, similar to the behavior of Ollama Chat request (No streaming) ?

This could be implemented by adding the parameter to the existing endpoint or creating a new endpoint, such as /v1/chat/completion/no-stream. This would make testing and debugging much more convenient.

What do you think?

kannae97 commented 3 weeks ago

Here’s a suggestion for a small feature enhancement: Would it be possible to add a boolean stream parameter to the chat API to control whether the response should return streamed content, similar to the behavior of Ollama Chat request (No streaming) ?

This could be implemented by adding the parameter to the existing endpoint or creating a new endpoint, such as /v1/chat/completion/no-stream. This would make testing and debugging much more convenient.

What do you think?

Can not agree more.

wsxiaoys commented 3 weeks ago

Please consider filing a new issue for a feature request to facilitate easier tracking. Such a feature can be implemented by refactoring the code at https://demo.tabbyml.com/files/github/TabbyML/tabby/-/blob/a653febaf4ac9308647285aacd3cb78cfe9c026a/crates/tabby/src/routes/chat.rs?plain=1#L35 to support both streaming and non-streaming use cases, without the need to add a new API endpoint.