Open themanyone opened 3 weeks ago
AFAIK support for gml3 and gml4 is already added: https://github.com/ggerganov/llama.cpp/pull/8031
Those are completely different files. That https://github.com/ggerganov/llama.cpp/pull/8031 was for the CLI version (which is also used/made into a server by some other projects like ollama). And the GGUF creation. This is for the gradio app server example that lets you choose a chat template when you run ./llama-server
from the whisper.cpp
github repo and navigate to http://localhost:port in the browser.
Prerequisites
Feature Description
ChatGLM3 uses a completely new prompt format. See https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md
I have created patch https://github.com/ggerganov/llama.cpp/commit/fd3492e85836c0df4b0404a47355159f4c349a44 for
examples/server/public/prompt-formats.js
Motivation
Fixes chat errors, repetitions, and role reversals when playing with the example server.
Possible Implementation
From ChatGLM3 README: