ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.19k stars 9.34k forks source link

Feature Request: Add support for chatglm3 in example server. #9164

Open themanyone opened 3 weeks ago

themanyone commented 3 weeks ago

Prerequisites

Feature Description

ChatGLM3 uses a completely new prompt format. See https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md

I have created patch https://github.com/ggerganov/llama.cpp/commit/fd3492e85836c0df4b0404a47355159f4c349a44 for examples/server/public/prompt-formats.js

Motivation

Fixes chat errors, repetitions, and role reversals when playing with the example server.

Possible Implementation

From ChatGLM3 README:

Overall Structure

The format of the ChatGLM3 dialogue consists of several conversations, each of which contains a dialogue header and content. A typical multi-turn dialogue structure is as follows:

<|system|>
You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.
<|user|>
Hello
<|assistant|>
Hello, I'm ChatGLM3. What can I assist you today?
ngxson commented 3 weeks ago

AFAIK support for gml3 and gml4 is already added: https://github.com/ggerganov/llama.cpp/pull/8031

themanyone commented 3 weeks ago

Those are completely different files. That https://github.com/ggerganov/llama.cpp/pull/8031 was for the CLI version (which is also used/made into a server by some other projects like ollama). And the GGUF creation. This is for the gradio app server example that lets you choose a chat template when you run ./llama-server from the whisper.cpp github repo and navigate to http://localhost:port in the browser.