BUG: HTTP endpoint loses connection and generation never stops

vlbosch commented 2 months ago

With larger models, like Mistral-Large, I encounter that the UI-endpoint I am using (for example Typing Mind) loses connection with the endpoint, but generation continues in the background and doesn't stop. Typing Mind gives the following error: "Something went wrong. This could be a temporary network connection issue. Please try again or contact support. Opening the console might help clarifying the issue. Technical detail:Load failed"

The terminal window doesn't show any errors. The generation continues for as long as I let exo run and only stops after ctrl+c. After stopping the proces, I get the following (which just seems as appropriate for manually stopping during a generation): "/opt/homebrew/Caskroom/miniconda/base/envs/exo/lib/python3.12/asyncio/base_even ts.py", line 685, in run_until_complete raise RuntimeError('Event loop stopped before Future completed.') RuntimeError: Event loop stopped before Future completed."

Running with the latest commit 62e3726 on macOS 15 DP5. Please let me know if you need anymore information.

AlexCheema commented 2 months ago

We could automatically cancel the inference when the connection is lost. What do you think?

vlbosch commented 2 months ago

I agree that inference should automatically be cancelled as soon as the connection is lost. However, I am more curious as to why the connection drops. First I thought it's a timeout for completion or something like that, since it mostly happens with large models when they produce big outputs. But it doesn't occur with mlx_lm.server, leading me to think it's related to exo. How can I help to triage this?

exo-explore / exo

BUG: HTTP endpoint loses connection and generation never stops #182