Open wishatch opened 10 months ago
What LLM are you using? Is it faster if you switch to a smaller one, or OpenAI one? It's expected for it to run slower because the LLM gets fed more tokens.
I am using -llama2 7b -Ubuntu 22.04 LTS -Docker Desktop Windows (wsl2 enabled) v4.25.0 -very highend PC server power -all the rest is per default configuration from github repo (no graphic card configuration)
I am experiencing the same issue, and wonder if there is any guide available to improve/benchmark the performance.
Make sure you're running on GPU.
Make sure you're running on GPU.
Is there a way to minimize the configuration of genai-stack, so that it runs reasonable speed without GPU (doesn't need to be super fast). GPU is expensive. It will be good if I can get familiar with this stack first before purchasing GPU card. Thx much for advice.
genai-stack works well at reasonable speed without RAG. But, when RAG is activated it runs very slow. Any advice on how to solve this? Thx