The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only supports text input. It'd be great to have the image input support in the future version. Or please let me know if know a workaround to add image input support for this.
Thank you,
@reachsak I will work on that. The problem I have at the moment, is that llama.cpp server stopped supporting images. But I will add it for vllm and TGI.
I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only supports text input. It'd be great to have the image input support in the future version. Or please let me know if know a workaround to add image input support for this. Thank you,