andrewnguonly / Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs
MIT License
1.34k stars 95 forks source link

Experiment with Ollama API concurrency #170

Open andrewnguonly opened 3 months ago

andrewnguonly commented 3 months ago

Ollama server environment variables: OLLAMA_NUM_PARALLEL OLLAMA_MAX_LOADED_MODELS OLLAMA_MAX_QUEUE

Ollama release notes: https://github.com/ollama/ollama/releases/tag/v0.1.33

andrewnguonly commented 3 months ago

OLLAMA_MAX_LOADED_MODELS meaningfully improves user experience. I can't tell the difference with OLLAMA_NUM_PARALLEL and OLLAMA_MAX_QUEUE.

ColumbusAI commented 4 days ago

thanks for sharing! I've added these variables now