Closed PierrunoYT closed 3 months ago
What models are you using on OpenRouter so we can replicate - latency matters a lot because OR does normally return done
finish responses :/
What models are you using on OpenRouter so we can replicate - latency matters a lot because OR does normally return
done
finish responses :/
I used Claude and ChatGPT but I think it will be on all models.
Using meta-llama/llama-3-8b-instruct
Using meta-llama/llama-3.1-8b-instruct:free
Using GPT/Anthropic and other random models I get responses as well.
I think the issue here is the whole issue with OpenRouter we have to work around for here: https://github.com/Mintplex-Labs/anything-llm/blob/296f04156455346a35cec4440239325523265d33/server/utils/AiProviders/openRouter/index.js#L164
Your internet connection to the model may be just slow enough per token response that you exceed the 500ms timeout which leads to the response being cut off. I have 5GB download, so my internet speeds are quite fast and that may not be the case on your and others ends.
We can make this timeout a config?
Can you check this
Not sure how we can fix this issue.
That is not the issue.
That is not the issue.
Okay then he was wrong.
Unless you are sending massive prompts the max_tokens response would not impact your output, which is not what the examples showed how you replicated the issue
@timothycarambat I set it to 1000 and I still get this
How are you running AnythingLLM?
AnythingLLM desktop app
What happened?
[BUG]: OpenRouter always cut off it's response
Are there known steps to reproduce?
No response