Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
26.65k stars 2.67k forks source link

[BUG]: OpenRouter always cut off it's response #1983

Closed PierrunoYT closed 3 months ago

PierrunoYT commented 3 months ago

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

[BUG]: OpenRouter always cut off it's response

grafik grafik

Are there known steps to reproduce?

No response

PierrunoYT commented 3 months ago

https://discord.com/channels/1114740394715004990/1266674095286784051 https://discord.com/channels/1114740394715004990/1248000016786919546

timothycarambat commented 3 months ago

What models are you using on OpenRouter so we can replicate - latency matters a lot because OR does normally return done finish responses :/

PierrunoYT commented 3 months ago

What models are you using on OpenRouter so we can replicate - latency matters a lot because OR does normally return done finish responses :/

I used Claude and ChatGPT but I think it will be on all models.

timothycarambat commented 3 months ago

Using meta-llama/llama-3-8b-instruct

Screenshot 2024-07-29 at 11 23 13 AM

Using meta-llama/llama-3.1-8b-instruct:free

Screenshot 2024-07-29 at 11 24 39 AM

Using GPT/Anthropic and other random models I get responses as well.

I think the issue here is the whole issue with OpenRouter we have to work around for here: https://github.com/Mintplex-Labs/anything-llm/blob/296f04156455346a35cec4440239325523265d33/server/utils/AiProviders/openRouter/index.js#L164

Your internet connection to the model may be just slow enough per token response that you exceed the 500ms timeout which leads to the response being cut off. I have 5GB download, so my internet speeds are quite fast and that may not be the case on your and others ends.

We can make this timeout a config?

PierrunoYT commented 3 months ago

Can you check this grafik

Not sure how we can fix this issue.

timothycarambat commented 3 months ago

That is not the issue.

PierrunoYT commented 3 months ago

That is not the issue.

Okay then he was wrong.

timothycarambat commented 3 months ago

Unless you are sending massive prompts the max_tokens response would not impact your output, which is not what the examples showed how you replicated the issue

PierrunoYT commented 3 months ago

@timothycarambat I set it to 1000 and I still get this grafik