Open rsdmike opened 2 weeks ago
Did this timeout occur while your LLM was responding or while you had a session open but had not yet sent a chat to the LLM? If you LLM is taking 10 minutes to reply that is kind of an insane latency, but yes it should not crash the server
Kind of in-between. Continuing a previous chat session/thread, restarted back-end (LocalAI), the first chat message to LocalAI loads the model into memory -- so its not responding yet to messages, and yes on CPU it takes a while to load into RAM for 70b - but subtle difference being that its not inferencing yet. Take a look at my screenshot here to see the events, by the time the model has loaded, the AnythingLLM server has told me it has better than things to do than wait (crashed) 😆 .
Should mention, after its loaded, its works good no issue.
Thanks for getting back to me 👍
Ah, so its just the model taking a long time to load the request moves on. The 10 minutes is no coincidence either. For LocalAI we use openai
's NPM package which has a 10-minute timeout
I would be nervous to have this be infinity because then you can hang the entire call. Is is unreasonable to ask to mlock
the model and basically prime i before using it to prevent this 😬 ?
I'm not super excited to accidentally lead to infinitely hanging requests for LocalAI!
Will add that regardless this should not exit the process - so that needs to be patched for sure
Yeah, I'd agree with that. Not too worried about the specific handling for LocalAI, just that the server doesn't crash for anythingllm. I can handle model preloading and such, but when I'm downloading various models and trying them out and loading them on the fly -- just to not have a crash would be good enough.
This is interesting, i am trying to replicate this right now and i cant get that exact timeout to occur. It is always handled, which have me thinking this exception is being thrown somewhere else that is not being caught. Any exception during streaming would be caught and prevent an outright crash.
Right now I'm having trouble reproducing the exact error so I can locate its full stacktrace and handle it
I'll see if i can run this locally, in debug mode, and give ya any more info. Also, I was 11 commits behind master, so, lemme grab latest and try again as well.
Not sure if this adds any more info, but using latest, still able to reproduce though. Looks like it originated from agentkeepalive in node_modules.
I'll keep playing around with this over the weekend. Workaround is easy enough to pre-load the model
From what i saw in the lockfile, the openai
npm module requires that sub-dependency, its just frustrating because i cant determine where we call the library before it aborts so we can handle it!
How are you running AnythingLLM?
Docker (local)
What happened?
Using LocalAI for backend -- loading the llama3 70b model, Anything LLM container crashed with a socket timeout.
A timeout I think is fine , it takes a while to load. However I didn't expect the container to crash, I kind of expected just to re-initiate the thread.
Are there known steps to reproduce?
I think this should be reproducable with any load time of greater than 10 minutes.