Combining grammars+multimodal models

abetlen / llama-cpp-python

Python bindings for llama.cpp

https://llama-cpp-python.readthedocs.io

MIT License

7.8k stars 934 forks source link

Combining grammars+multimodal models #1726

Closed joris-sense closed 3 weeks ago

joris-sense commented 3 weeks ago

Is your feature request related to a problem? Please describe. I'd like to use a vision model like LLava 1.6 together with a grammar (constrained generation) in llama-cpp-python.

Describe the solution you'd like It would be nice if I could pass multimodal inputs to an LLM in llama-cpp-python with a grammar, for example with an interface like

response = llm(
    "JSON list of name strings of attractions in SF:", image= "image.jpg",
    grammar=grammar, max_tokens=-1
)

joris-sense commented 3 weeks ago

It turns out that grammars can be passed as a parameter in llm.create_chat_completion(grammar=grammar when using multimodal models as described in README.md