OoriData / Toolio

AI API implementation for Mac which supports tool-calling & other structured LLM response generation (e.g. conform to JSON schema)
80 stars 2 forks source link

Model type isn't a reliable way to determine e.g. whether the system role is supported #7

Open uogbuji opened 1 month ago

uogbuji commented 1 month ago

In adding support for Gemma models ( #6 ) I set up model flags per model type, including one for whether or not the system role is supported, model_flag.NO_SYSTEM_ROLE.

Then I ran into h2oai/h2o-danube3-4b-chat which loads as a llama model type but also doesn't support the system role.

We may actually have to do some sort of check via the tokenizer template. Failing this, we might have to make it a user flag.

# On server:
toolio_server --model=$HOME/.local/share/models/mlx/h2o-danube3-4b-chat-4bit
# On client:
echo 'What is the square root of 256?' > /tmp/llmprompt.txt
echo '{"tools": [{"type": "function","function": {"name": "square_root","description": "Get the square root of the given number","parameters": {"type": "object", "properties": {"square": {"type": "number", "description": "Number from which to find the square root"}},"required": ["square"]},"pyfunc": "math|sqrt"}}], "tool_choice": "auto"}' > /tmp/toolspec.json
toolio_request --apibase="http://localhost:8000" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json

Server error excerpt:

  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/toolio/cli/server.py", line 267, in post_v1_chat_completions_impl
    for result in app.state.model.completion(
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/toolio/schema_helper.py", line 293, in completion
    prompt_tokens = self.tokenizer.encode_prompt(prompt)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/llm_structured_output/util/tokenization.py", line 34, in encode_prompt
    return self.tokenizer.apply_chat_template(prompt)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1812, in apply_chat_template
    rendered_chat = compiled_template.render(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 1, in top-level template code
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1852, in raise_exception
    raise TemplateError(message)
jinja2.exceptions.TemplateError: System role not supported
uogbuji commented 1 month ago

BTW I'm eager to try out Danube because, no way we could do tool-calling with a 500m model, surely, but you never know! 😁