Experiment with Ollama API concurrency

andrewnguonly / Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs

MIT License

1.42k stars 102 forks source link

Open andrewnguonly opened 6 months ago

andrewnguonly commented 6 months ago

Ollama server environment variables: OLLAMA_NUM_PARALLEL OLLAMA_MAX_LOADED_MODELS OLLAMA_MAX_QUEUE

andrewnguonly commented 6 months ago

OLLAMA_MAX_LOADED_MODELS meaningfully improves user experience. I can't tell the difference with OLLAMA_NUM_PARALLEL and OLLAMA_MAX_QUEUE.

ColumbusAI commented 3 months ago

thanks for sharing! I've added these variables now