danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
18.98k stars 3.15k forks source link

[Bug]: Librechat doesn't wait until Ollama has loaded model #3330

Open vlbosch opened 4 months ago

vlbosch commented 4 months ago

What happened?

When starting a conversation with a model served via Ollama, it sometimes happens the prompt stops prematurely with an ETIMEDOUT error. After resending the prompt, it is answered correctly, but only because the model has been loaded by then. With larger models (like Command R Plus), it takes several retries.

Steps to Reproduce

  1. Start a chat with an Ollama-served model
  2. Prompt the model
  3. Send the prompt
  4. See the general error message appear

What browsers are you seeing the problem on?

Safari

Relevant log output

Librechat error-log: 
{"cause":{"code":"ETIMEDOUT","errno":"ETIMEDOUT","message":"request to http://host.docker.internal:11434/v1/chat/completions failed, reason: read ETIMEDOUT","type":"system"},"level":"error","message":"[handleAbortError] AI response error; aborting request: Connection error.","stack":"Error: Connection error.\n    at OpenAI.makeRequest (/app/api/node_modules/openai/core.js:292:19)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async ChatCompletionStream._createChatCompletion (/app/api/node_modules/openai/lib/ChatCompletionStream.js:53:24)\n    at async ChatCompletionStream._runChatCompletion (/app/api/node_modules/openai/lib/AbstractChatCompletionRunner.js:314:16)"}

Librechat debug-log: 
2024-07-12T05:36:03.543Z debug: [OpenAIClient] chatCompletion
{
  baseURL: "http://host.docker.internal:11434/v1",
    modelOptions.model: "gemma-2-27b-it-Q8_0_L.gguf:latest",
    modelOptions.temperature: 0.7,
    modelOptions.top_p: 1,
    modelOptions.presence_penalty: 0,
    modelOptions.frequency_penalty: 0,
    modelOptions.stop: undefined,
    modelOptions.max_tokens: undefined,
    modelOptions.user: "668d8e6941ec54b9987a6bcc",
    modelOptions.stream: true,
    // 2 message(s)
    modelOptions.messages: [{"role":"system","name":"instructions","content":"Instructions:\nAntwoord uitsluitend in het Nederla... [truncated],{"role":"user","content":"Dit is een test."}],
}
2024-07-12T05:36:03.548Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions
2024-07-12T05:36:15.288Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions
2024-07-12T05:36:27.455Z debug: Making request to http://host.docker.internal:11434/v1/chat/completions
2024-07-12T05:36:38.743Z warn: [OpenAIClient.chatCompletion][stream] API error
2024-07-12T05:36:38.743Z error: [handleAbortError] AI response error; aborting request: Connection error.
2024-07-12T05:36:38.747Z debug: [AskController] Request closed
2024-07-12T06:27:35.786Z debug: [AskController]

Ollama-server log:
level=WARN source=server.go:570 msg="client connection closed before server finished loading, aborting load"
level=ERROR source=sched.go:480 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled"

Screenshots

No response

Code of Conduct

vlbosch commented 4 months ago

I tested a bit more. Even when the model is loaded, but prompted with a large prompt, a timeout occurs. I think that the timeout should be removed when using Ollama, and only present an error when Ollama does.

vlbosch commented 2 months ago

@danny-avila Did you have a chance to look into this issue? Would be great if timeouts for custom endpoints can be changed and/or disabled completely. Thanks!