Incomplete response from LM studio endpoint

Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.

https://anythingllm.com

MIT License

24.08k stars 2.41k forks source link

Incomplete response from LM studio endpoint #485

Closed unseensholar closed 8 months ago

unseensholar commented 9 months ago

I am getting incomplete responses while using the LM studio endpoint. The response cuts off midway while streaming, sometimes after the first word or after half a sentence. I am running on docker.

shatfield4 commented 9 months ago

Hi @Daniel-Dan-Espinoza, which model are you using in LM Studio? This typically happens when models that are less optimized for chatting are being used. Also, are you running LM Studio locally on the same machine as your AnythingLLM docker container?

unseensholar commented 9 months ago

I was using Starling model. I am running both LM Studio and AnythingLLM docker container on the same machine.

unseensholar commented 9 months ago

I tried Local AI but have the same issue

timothycarambat commented 9 months ago

When interacting with LM Studio we leave the entire run of inferencing on the LM Studio side. We simply pass along the input and wait for LM Studio to be done with output.

When an inferencing is running Its likely the output being sent to AnythingLLM is not being dropped, but that LM Studio stops generating output and AnythingLLM assumes the response to be done.

Can you ensure that the model is not continuing to generate response when AnythingLLM says the response is complete? This would help determine if the issue is with the model/config on LM Studio or AnythingLLM

lunamidori5 commented 9 months ago

@timothycarambat this is the streaming bug fix for localai we added. This is the fix working but we need to learn why its dropping the packets

timothycarambat commented 9 months ago

@lunamidori5 We would need to confirm that the user is running the patched version, and if so then yes for sure need to see why. To be fair, I have yet to replicate this issue with LocalAi (or LM Studio for that matter)

lunamidori5 commented 9 months ago

@timothycarambat at least im not the only one with this bug (I am starting to think it maybe the way some routers work...)

timothycarambat commented 8 months ago

Closing as stale

dlaliberte commented 7 months ago

I am seeing this problem, using the latest version of AnythingLLM (0.2.0?). I saw it when using LM Studio, but then it seemed to clear up on its own, or maybe it was after I reset the chat in AnythingLLM. Then I got the empty content complaint from LM Studio, and I decided enough is enough, and I switched to Kobold. Now I am seeing the one token problem using Kobold, via the Local AI LLM setting in AnythingLLM (chat model selection: (which I can't seem to copy from the form, sigh...) koboldcpp/dolphin-2.2.1-mistral-7b.Q5_K_S). Resetting the chat history doesn't help.

I'm fine with disabling the streaming mode, for now. I don't see any way to do that, either in AnythingLLM or Kobold.

Looks like this might be a relevant issue: https://github.com/LostRuins/koboldcpp/issues/669 So it may be a bug on the Kobold side.

lunamidori5 commented 7 months ago

@timothycarambat could you add a "no streaming" check mark to the llm screen?

timothycarambat commented 7 months ago

Is it because there is an issue with streaming or because certain models do not support it?

dlaliberte commented 7 months ago

Is it because there is an issue with streaming or because certain models do not support it?

With Kobold, I was seeing the whole stream of tokens being generated, so clearly the model supports streaming, and Kobold supports streaming, but from the AnythingLLM side, it was already done after the first token came in. So it seems like a timeout issue, not waiting long enough for the next token? It seems there should be a special signal that the stream is finished because, otherwise, how would anyone know?

lunamidori5 commented 7 months ago

Is it because there is an issue with streaming or because certain models do not support it?

LocalAI and Google Gem Still have that streaming bug from before...