Closed mgroeber9110 closed 2 months ago
You can already access chat_template from /props endpoint: https://github.com/ggerganov/llama.cpp/pull/8337
IMO we can start to search for lightweight jinja parser in JS that can fit on the project. If we can't find one, then let's do the chat_example way.
I have done a bit of research regarding JS implementations in Jina. So far, I am aware of two candidates:
I have tested several templates from HF models with the Online Demo of jinja-js, and so far found the following features being unsupported (while being used in some templates):
raise_exception
.strip()
|trim
filterrange(0, messages|length)
messages[1:]
Some of these should probably be easy to add, but of course we cannot guarantee full compatibility with Python expressions. Overall, I am getting about 50% of templates working out-of-the-box.
Overall, I am getting about 50% of templates working out-of-the-box.
IMO I'd prefer not to deliver a half-working version.
In any cases, I don't see many real benefits of having a jinja parser on llama.cpp server web UI, given that we already have /chat/completions
endpoint that can handle chat formatting in cpp code.
There is already one implementation of such web UI, but it kinda one-off usage (some people just want to push their code into the repo without thinking about long-term impact). Would be nice if we can somehow integrate it into the main UI: https://github.com/ggerganov/llama.cpp/tree/master/examples/server/public_simplechat
Sounds good to me. Is there already a long-term plan somewhere for what the server UI should look like long-term. I could see something like this:
/chat/completion
endpoint (default, for most users)/completion
endpoint (with the existing template code in new UI, for interactive experiments with custom template tweaks)Switching between the first two options could be a checkbox below "chat" mode.
Does this make sense to track such a plan (or a similar one) as a separate issue?
This issue was closed because it has been inactive for 14 days since being marked as stale.
Prerequisites
Feature Description
Currently there is no way of retrieving information about the recommended chat template for a model when using the
/completion
endpoint of the server. The idea of this feature is to add a new propertychat_example
to the/props
endpoint that returns the same information that is already logged when the server starts up:This goes in a same direction as the suggestion in #5447 (closed as "stale"), but now explicitly uses the ability to read templates from models and execute them with
llama_chat_apply_template
.Motivation
This came up in #8196 as a way of accessing the built-in templates also from the server UI, with a formatted chat being potentially easier to work with than a full jinja2 template, at least for common cases where the template is not too complex. This would enable a number of extensions to using templates with the
/completion
endpoint:Possible Implementation
chat_example
string (that is already logged) for future use./props
response, return thechat_example
string at the top level, next to thesystem_prompt
.