⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
[ ] I believe this is a way to improve. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that requests the same enhancement
Problem
Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.
Solution
Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json
Validations
Problem
Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.
Solution
Please help add more property cache_prompt = true when call the /completion API, or add more configuration property into the config.json
Thanks