Open vlbosch opened 2 months ago
We could automatically cancel the inference when the connection is lost. What do you think?
I agree that inference should automatically be cancelled as soon as the connection is lost. However, I am more curious as to why the connection drops. First I thought it's a timeout for completion or something like that, since it mostly happens with large models when they produce big outputs. But it doesn't occur with mlx_lm.server, leading me to think it's related to exo. How can I help to triage this?
With larger models, like Mistral-Large, I encounter that the UI-endpoint I am using (for example Typing Mind) loses connection with the endpoint, but generation continues in the background and doesn't stop. Typing Mind gives the following error: "Something went wrong. This could be a temporary network connection issue. Please try again or contact support. Opening the console might help clarifying the issue. Technical detail:Load failed"
The terminal window doesn't show any errors. The generation continues for as long as I let exo run and only stops after ctrl+c. After stopping the proces, I get the following (which just seems as appropriate for manually stopping during a generation): "/opt/homebrew/Caskroom/miniconda/base/envs/exo/lib/python3.12/asyncio/base_even ts.py", line 685, in run_until_complete raise RuntimeError('Event loop stopped before Future completed.') RuntimeError: Event loop stopped before Future completed."
Running with the latest commit 62e3726 on macOS 15 DP5. Please let me know if you need anymore information.