Is your feature request related to a problem? Please describe.
I'd like to use a vision model like LLava 1.6 together with a grammar (constrained generation) in llama-cpp-python.
Describe the solution you'd like
It would be nice if I could pass multimodal inputs to an LLM in llama-cpp-python with a grammar, for example with an interface like
response = llm(
"JSON list of name strings of attractions in SF:", image= "image.jpg",
grammar=grammar, max_tokens=-1
)
It turns out that grammars can be passed as a parameter in llm.create_chat_completion(grammar=grammar when using multimodal models as described in README.md
Is your feature request related to a problem? Please describe. I'd like to use a vision model like LLava 1.6 together with a grammar (constrained generation) in llama-cpp-python.
Describe the solution you'd like It would be nice if I could pass multimodal inputs to an LLM in llama-cpp-python with a grammar, for example with an interface like