continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
19.25k stars 1.66k forks source link

Chat is very slow when using with the llama.cpp server #2845

Open lehuythangit opened 1 week ago

lehuythangit commented 1 week ago

Validations

Problem

Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.

Solution

Please help add more property cache_prompt = true when call the /completion API, or add more configuration property into the config.json

Thanks

lehuythangit commented 1 week ago

i just workaround with modified the .vscode\extensions\continue.continue-0.8.55-win32-x64\out\extension.js

image