Open Benjoyo opened 8 months ago
I dont quite understand how it works. The only difference in format is that after you get the result json from your function call you have to return it through a role tool
instead of user or assistant, a standard chatml compatible format should be able to use that.
Unless I'm misunderstanding?
I dont quite understand how it works. The only difference in format is that after you get the result json from your function call you have to return it through a role
tool
instead of user or assistant, a standard chatml compatible format should be able to use that.Unless I'm misunderstanding?
Ok, so actually using the chatml-function-calling
format (yes, a chatml compatible format) with Hermes Pro does actually work. But that is only because the model is so good and the handler injects a system prompt explaining the function calling format that Hermes Pro then actually follows (or at least the grammar sampling makes it do it fine) although it was trained on a different format.
Here is the formatted chatml-function-calling
prompt for reference:
<|im_start|>system
Extract all orders from the user input as a JSON list.
You have access to the following functions:
functions.store_orders:
{"properties": {"orders": {"description": "the orders", "items": {"properties": {"customer_name": {"description": "the name of the customer that this order is for", "type": "string"}, "number": {"description": "the order number", "type": "number"}}, "required": ["number", "customer_name"], "type": "object"}, "type": "array"}}, "required": ["orders"], "type": "object"}
You can respond to users messages with either a single message or one or more function calls.
To respond with a message begin the message with 'message:', use the following format:
message:
<message>
To respond with one or more function calls begin the message with 'functions.<function_name>:', use the following format:
functions.<function_name>:
{ "arg1": "value1", "arg2": "value2" }
functions.<function_name>:
{ "arg1": "value1", "arg2": "value2" }<|im_end|>
<|im_start|>user
Hello it's Mike. Let me get the #3 and the #12. For my friend Jeff I would like the #2.<|im_end|>
<|im_start|>assistant
Now, to get the best performance one would want to use the correct Hermes Pro format, which does not work with this handler, as it forces pure json output (without XML tags) for fixed tool_choice and stops at a :
for auto tool_choice, following the other format.
But looking at it closer as I did just now, I might be able to adapt this myself (or have Claude Opus do it lol).
Hi @Benjoyo I just had a look at the chatml-function-calling handler and while i understand how tools are passed into the prompt as function signatures, like @teknium1 i don't understand how exactly function call parsing is done.
for chatml-function-calling you could change the system prompt in the jinja template but the method for adding function signatures to the system prompt would remain the same.
# System message
"{% if message.role == 'system' %}"
"{{ message.content }}"
"{% if tool_calls %}"
"\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions."
"\nHere are the available tools:"
"\n<tools> tools </tools>"
"\n\nYou can respond to users messages with either a single message or one or more function calls."
"\n\nTo respond with a message begin the message with 'message:', use the following format:"
"\n\nmessage:"
"\n<message>"
"\nFor each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:"
"\n<tool_call>"
"\n{"arguments": <args-dict>, "name": <function-name>}"
"\n</tool_call>"
"{% endif %}"
"<|im_end|>\n"
"{% endif %}"
For parsing function calls, you'd have to add a parsing function such as below:
import ast
import xml.etree.ElementTree as ET
class FunctionCall(BaseModel):
arguments: dict
name: str
def parse_function_calls(completion):
function_calls = []
root = ET.fromstring(f"<root>{text}</root>")
for tool_call in root.findall(".//tool_call"):
try:
function_call_json = json.loads(tool_call.text.strip())
function_call = FunctionCall(**function_call_json)
function_calls.append(function_call)
except json.JSONDecodeError:
try:
# Try parsing with ast.literal_eval if json.loads fails
function_call_json = ast.literal_eval(tool_call.text.strip())
function_call = FunctionCall(**function_call_json)
function_calls.append(function_call)
except (ValueError, SyntaxError):
pass
return function_calls
It looks like this is how they are parsing function calls in the functionary-chat-handler which you can modify by using the function above:
if function_call is None or (
isinstance(function_call, str) and function_call == "auto"
):
stop = "\n"
completion: llama_types.Completion = llama.create_completion(
prompt=prompt, stop=stop, stream=False
) # type: ignore
completion_text = completion["choices"][0]["text"]
# strip " to=functions." and ending ":"
function_call = completion_text.split(".")[-1][:-1]
new_prompt = prompt + completion_text + stop
But for chatml-function-calling handler I couldn't understand exactly how they are parsing function calls.
Hope this helps 🙂
Thanks @interstellarninja for the hints. I will have a closer look and try to make a PR soon.
At another look, my guess is they are completely relying on grammars for parsing the function calls in both handlers
Here's how they are defining grammars using function signatures in chatml-function-calling handler
# One or more function calls
tool_name = text[len("functions.") :]
tool = next((tool for tool in tools if tool["function"]["name"] == tool_name), None)
if not stream:
completions = []
completions_tool_name = []
while tool is not None:
prompt += f"functions.{tool_name}:\n"
try:
grammar = llama_grammar.LlamaGrammar.from_json_schema(
json.dumps(tool["function"]["parameters"]), verbose=llama.verbose
)
except Exception as e:
grammar = llama_grammar.LlamaGrammar.from_string(
llama_grammar.JSON_GBNF, verbose=llama.verbose
)
if llama.verbose:
print(
"Failed to parse function body as JSON schema, falling back to default grammar"
)
print(e)
So perhaps changing the system prompt and a few other things might just work even if function calls are wrapped around XML tags.
Yeah, I think one needs to change the stop sequence to the starting tag to detect the model wanting to generate a call, then use grammar, and then stop at the closing tag. These stop criteria are different in the other format. Rest could be very similar or same.
hey @adrienbrault has implemented OpenAI function calling format for Ollama, maybe this serves as a guide even though it is written in golang. https://github.com/ollama/ollama/pull/3303
Hey, thank you so much for the great model and this repo!
Would you be willing to add support for this chat format to llama-cpp-python, so that we can use function calling (and JSON mode) with their OpenAI compatible server?
Right now, llama-cpp-python offers the only OpenAI compatible server with constrained/grammar based sampling for CPU that I am aware of. It has been very convenient to use with the functionary models, as it is plug&play with the openai client and very reliable thanks to the grammar sampling.
Besides functionary, there is already support for a format called chatml-function-calling which might be similar enough to the Hermes format to be able to just adapt it instead of writing something from scratch:
https://github.com/abetlen/llama-cpp-python/blob/6eb25231e4dafeb792ffc9597c27330344c970b1/llama_cpp/llama_chat_format.py#L2045
All that would need to be added to the library is a handler like that.
Thanks!