I don't understand to set the chat_llm to ollama, if there is no preparation for utility_llm and/or embedding_llm to set it to local (ollama) pendants. Yes, I assume that prompting will be a challenge, so I suggest "run modes" defined by models.
the prompts-directory contains the prompts in markdown format. maybe it could help to split up the markdown files in folders, referring to a certain model. Maybe it is easier to engineer prompts for a particular model.
As a starting point, we could copy the current "master prompts", refining em for the configured model.
I don't understand to set the chat_llm to ollama, if there is no preparation for utility_llm and/or embedding_llm to set it to local (ollama) pendants. Yes, I assume that prompting will be a challenge, so I suggest "run modes" defined by models.
the prompts-directory contains the prompts in markdown format. maybe it could help to split up the markdown files in folders, referring to a certain model. Maybe it is easier to engineer prompts for a particular model.
As a starting point, we could copy the current "master prompts", refining em for the configured model.
What do you think?