Open mgroeber9110 opened 1 month ago
You can already access chat_template from /props endpoint: https://github.com/ggerganov/llama.cpp/pull/8337
IMO we can start to search for lightweight jinja parser in JS that can fit on the project. If we can't find one, then let's do the chat_example way.
I have done a bit of research regarding JS implementations in Jina. So far, I am aware of two candidates:
I have tested several templates from HF models with the Online Demo of jinja-js, and so far found the following features being unsupported (while being used in some templates):
raise_exception
.strip()
|trim
filterrange(0, messages|length)
messages[1:]
Some of these should probably be easy to add, but of course we cannot guarantee full compatibility with Python expressions. Overall, I am getting about 50% of templates working out-of-the-box.
Overall, I am getting about 50% of templates working out-of-the-box.
IMO I'd prefer not to deliver a half-working version.
In any cases, I don't see many real benefits of having a jinja parser on llama.cpp server web UI, given that we already have /chat/completions
endpoint that can handle chat formatting in cpp code.
There is already one implementation of such web UI, but it kinda one-off usage (some people just want to push their code into the repo without thinking about long-term impact). Would be nice if we can somehow integrate it into the main UI: https://github.com/ggerganov/llama.cpp/tree/master/examples/server/public_simplechat
Sounds good to me. Is there already a long-term plan somewhere for what the server UI should look like long-term. I could see something like this:
/chat/completion
endpoint (default, for most users)/completion
endpoint (with the existing template code in new UI, for interactive experiments with custom template tweaks)Switching between the first two options could be a checkbox below "chat" mode.
Does this make sense to track such a plan (or a similar one) as a separate issue?
Prerequisites
Feature Description
Currently there is no way of retrieving information about the recommended chat template for a model when using the
/completion
endpoint of the server. The idea of this feature is to add a new propertychat_example
to the/props
endpoint that returns the same information that is already logged when the server starts up:This goes in a same direction as the suggestion in #5447 (closed as "stale"), but now explicitly uses the ability to read templates from models and execute them with
llama_chat_apply_template
.Motivation
This came up in #8196 as a way of accessing the built-in templates also from the server UI, with a formatted chat being potentially easier to work with than a full jinja2 template, at least for common cases where the template is not too complex. This would enable a number of extensions to using templates with the
/completion
endpoint:Possible Implementation
chat_example
string (that is already logged) for future use./props
response, return thechat_example
string at the top level, next to thesystem_prompt
.