Request for image input support

Maximilian-Winter / llama-cpp-agent

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.

Other

426 stars 36 forks source link

Request for image input support #68

Open reachsak opened 1 month ago

reachsak commented 1 month ago

I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only supports text input. It'd be great to have the image input support in the future version. Or please let me know if know a workaround to add image input support for this. Thank you,

Maximilian-Winter commented 4 weeks ago

@reachsak I will work on that. The problem I have at the moment, is that llama.cpp server stopped supporting images. But I will add it for vllm and TGI.