Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
20.72k stars 1.05k forks source link

Fork `openai` Python package to support llama.cpp specific features #207

Open tybalex opened 10 months ago

tybalex commented 10 months ago

To apply grammar to chat completion, it looks like the llamafile server is expecting the argument grammar: https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/server.cpp#L2551

if (body.count("grammar") != 0) {
        llama_params["grammar"] = json_value(body, "grammar", json::object());
    }

However grammar is not a supported argument in the OpenAI APIs, aka, I can't do something like this :

oai_client.chat.completions.create(
            model=model,
            messages=messages,
            stream=stream,
            temperature=0.1,
            grammar=grammar,
        )

Would this ever be a supported feature in the future?

jart commented 10 months ago

If you're using the Python client library that's published by OpenAI, then it's not going to support features OpenAI doesn't have. However the llamafile server does support grammar. For example, here's an example of how to use grammar via the OpenAI API using curl.

curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "gpt-3.5-turbo",
  "grammar": "root ::= [a-z]+ (\" \" [a-z]+)+",
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming."
    }
  ]
}' | python3 -c '
import json
import sys
json.dump(json.load(sys.stdin), sys.stdout, indent=2)
print()
'
flatsiedatsie commented 10 months ago

then it's not going to support features OpenAI doesn't have

I assume you mean Llamafile?

jart commented 10 months ago

No, I meant OpenAI. As far as I know, OpenAI hasn't devoted any engineering resources to adding support, in their Python client library, for features that are specific to llamafile and llama.cpp.