genai-stack runs very slow when RAG is activated

docker / genai-stack

Langchain + Docker + Neo4j + Ollama

Creative Commons Zero v1.0 Universal

3.85k stars 824 forks source link

genai-stack runs very slow when RAG is activated #93

Open wishatch opened 10 months ago

wishatch commented 10 months ago

genai-stack works well at reasonable speed without RAG. But, when RAG is activated it runs very slow. Any advice on how to solve this? Thx

oskarhane commented 10 months ago

What LLM are you using? Is it faster if you switch to a smaller one, or OpenAI one? It's expected for it to run slower because the LLM gets fed more tokens.

wishatch commented 10 months ago

I am using -llama2 7b -Ubuntu 22.04 LTS -Docker Desktop Windows (wsl2 enabled) v4.25.0 -very highend PC server power -all the rest is per default configuration from github repo (no graphic card configuration)

JasonPad19 commented 10 months ago

I am experiencing the same issue, and wonder if there is any guide available to improve/benchmark the performance.

oskarhane commented 10 months ago

Make sure you're running on GPU.

wishatch commented 10 months ago

Make sure you're running on GPU.

Is there a way to minimize the configuration of genai-stack, so that it runs reasonable speed without GPU (doesn't need to be super fast). GPU is expensive. It will be good if I can get familiar with this stack first before purchasing GPU card. Thx much for advice.