Multiple models context management like Ollama.

With the help of llama-cpp-agent, I can use function calling and json-schema ability of one llama model nearly perfectly. 😊 Given I want to use code-llm like codellama to generate function tools and use hermes-2-pro-mistral-7b to use them as https://github.com/Maximilian-Winter/llama-cpp-agent/blob/master/examples/05_Agents/hermes_2_pro_agent.py do. And may use another llm by llama-cpp-python to take other tasks. If I only have Limited gpu memory ,What's going to disturb me is the lack of model switch ability in llama-cpp-python, which also can see in https://github.com/abetlen/llama-cpp-python/issues/223

Auto model switch and the manage of gpu memory have be done by Ollama, but it lack ability of convenient function tools and json-schema output.

How can I tackle this ? Looking forward to your reply. 😊

Maximilian-Winter / llama-cpp-agent