[Question]: Increase completion timeout to prevent crash?

rsdmike commented 2 weeks ago

How are you running AnythingLLM?

Docker (local)

What happened?

Using LocalAI for backend -- loading the llama3 70b model, Anything LLM container crashed with a socket timeout.

2024-05-15 22:24:20 [TELEMETRY SENT] {
2024-05-15 22:24:20   event: 'sent_chat',
2024-05-15 22:24:20   distinctId: 'be3ac3d9-aa83-4458-ae1a-583a3fcc909b',
2024-05-15 22:24:20   properties: {
2024-05-15 22:24:20     multiUserMode: false,
2024-05-15 22:24:20     LLMSelection: 'localai',
2024-05-15 22:24:20     Embedder: 'openai',
2024-05-15 22:24:20     VectorDbSelection: 'lancedb',
2024-05-15 22:24:20     runtime: 'docker'
2024-05-15 22:24:20   }
2024-05-15 22:24:20 }
2024-05-15 22:24:20 [Event Logged] - sent_chat
2024-05-15 23:04:01 Cannonball results 3511 -> 470 tokens.
2024-05-15 23:04:01 Cannonball results 356 -> 286 tokens.
2024-05-15 23:04:53 [TELEMETRY SENT] {
2024-05-15 23:04:53   event: 'sent_chat',
2024-05-15 23:04:53   distinctId: 'be3ac3d9-aa83-4458-ae1a-583a3fcc909b',
2024-05-15 23:04:53   properties: {
2024-05-15 23:04:53     multiUserMode: false,
2024-05-15 23:04:53     LLMSelection: 'localai',
2024-05-15 23:04:53     Embedder: 'openai',
2024-05-15 23:04:53     VectorDbSelection: 'lancedb',
2024-05-15 23:04:53     runtime: 'docker'
2024-05-15 23:04:53   }
2024-05-15 23:04:53 }
2024-05-15 23:04:53 [Event Logged] - sent_chat
2024-05-15 23:25:00 node:internal/process/promises:288
2024-05-15 23:25:00             triggerUncaughtException(err, true /* fromPromise */);
2024-05-15 23:25:00             ^
2024-05-15 23:25:00 
2024-05-15 23:25:00 Error: Socket timeout
2024-05-15 23:25:00     at Socket.onTimeout (/app/server/node_modules/agentkeepalive/lib/agent.js:350:23)
2024-05-15 23:25:00     at Socket.emit (node:events:529:35)
2024-05-15 23:25:00     at Socket._onTimeout (node:net:598:8)
2024-05-15 23:25:00     at listOnTimeout (node:internal/timers:569:17)
2024-05-15 23:25:00     at process.processTimers (node:internal/timers:512:7) {
2024-05-15 23:25:00   code: 'ERR_SOCKET_TIMEOUT',
2024-05-15 23:25:00   timeout: 601000
2024-05-15 23:25:00 }
2024-05-15 23:25:00 
2024-05-15 23:25:00 Node.js v18.19.1

A timeout I think is fine , it takes a while to load. However I didn't expect the container to crash, I kind of expected just to re-initiate the thread.

Are there known steps to reproduce?

I think this should be reproducable with any load time of greater than 10 minutes.

timothycarambat commented 2 weeks ago

Did this timeout occur while your LLM was responding or while you had a session open but had not yet sent a chat to the LLM? If you LLM is taking 10 minutes to reply that is kind of an insane latency, but yes it should not crash the server

rsdmike commented 2 weeks ago

Kind of in-between. Continuing a previous chat session/thread, restarted back-end (LocalAI), the first chat message to LocalAI loads the model into memory -- so its not responding yet to messages, and yes on CPU it takes a while to load into RAM for 70b - but subtle difference being that its not inferencing yet. Take a look at my screenshot here to see the events, by the time the model has loaded, the AnythingLLM server has told me it has better than things to do than wait (crashed) 😆 .

Should mention, after its loaded, its works good no issue.

Thanks for getting back to me 👍

timothycarambat commented 2 weeks ago

Ah, so its just the model taking a long time to load the request moves on. The 10 minutes is no coincidence either. For LocalAI we use openai's NPM package which has a 10-minute timeout

I would be nervous to have this be infinity because then you can hang the entire call. Is is unreasonable to ask to mlock the model and basically prime i before using it to prevent this 😬 ?

I'm not super excited to accidentally lead to infinitely hanging requests for LocalAI!

timothycarambat commented 2 weeks ago

Will add that regardless this should not exit the process - so that needs to be patched for sure

rsdmike commented 2 weeks ago

Yeah, I'd agree with that. Not too worried about the specific handling for LocalAI, just that the server doesn't crash for anythingllm. I can handle model preloading and such, but when I'm downloading various models and trying them out and loading them on the fly -- just to not have a crash would be good enough.

timothycarambat commented 2 weeks ago

This is interesting, i am trying to replicate this right now and i cant get that exact timeout to occur. It is always handled, which have me thinking this exception is being thrown somewhere else that is not being caught. Any exception during streaming would be caught and prevent an outright crash.

Right now I'm having trouble reproducing the exact error so I can locate its full stacktrace and handle it

rsdmike commented 2 weeks ago

I'll see if i can run this locally, in debug mode, and give ya any more info. Also, I was 11 commits behind master, so, lemme grab latest and try again as well.

rsdmike commented 2 weeks ago

Not sure if this adds any more info, but using latest, still able to reproduce though. Looks like it originated from agentkeepalive in node_modules.

I'll keep playing around with this over the weekend. Workaround is easy enough to pre-load the model

timothycarambat commented 2 weeks ago

From what i saw in the lockfile, the openai npm module requires that sub-dependency, its just frustrating because i cant determine where we call the library before it aborts so we can handle it!

Mintplex-Labs / anything-llm