Feature Request: Add support for chatglm3 in example server.

themanyone commented 3 weeks ago

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

ChatGLM3 uses a completely new prompt format. See https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md

I have created patch https://github.com/ggerganov/llama.cpp/commit/fd3492e85836c0df4b0404a47355159f4c349a44 for examples/server/public/prompt-formats.js

Motivation

Fixes chat errors, repetitions, and role reversals when playing with the example server.

Possible Implementation

From ChatGLM3 README:

Overall Structure

The format of the ChatGLM3 dialogue consists of several conversations, each of which contains a dialogue header and content. A typical multi-turn dialogue structure is as follows:

<|system|>
You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.
<|user|>
Hello
<|assistant|>
Hello, I'm ChatGLM3. What can I assist you today?

ngxson commented 3 weeks ago

AFAIK support for gml3 and gml4 is already added: https://github.com/ggerganov/llama.cpp/pull/8031

themanyone commented 3 weeks ago

Those are completely different files. That https://github.com/ggerganov/llama.cpp/pull/8031 was for the CLI version (which is also used/made into a server by some other projects like ollama). And the GGUF creation. This is for the gradio app server example that lets you choose a chat template when you run ./llama-server from the whisper.cpp github repo and navigate to http://localhost:port in the browser.

ggerganov / llama.cpp