Graceful handling of websocket closure

barsuna commented 4 months ago

Excellent project!

I've hit an issue when trying to test gpt-researcher with locally running llama3. When using firefox (125.0.2) on linux i see that on long research the browser closes the websocket (ping timeout?)

It shows up on the console like this:

🤔 Generating subtopics...

🤖 Calling gpt-4o...

📋Subtopics: subtopics=[...]
INFO:     connection closed

When this happens, next message that the agent tries to print is leading to agent 'freezing' - nothing else is happening after that point, i.e.

🔎 Starting the research task for '...'...

This does not happen with other browser (tried brave)

I cannot say i understand well what the relevant code does, but looking at server.py, i see that

...
    except WebSocketDisconnect:
        await manager.disconnect(websocket)
...

is getting invoked, just not sure that what is does is really handling the situation well - it seems that this exception is triggered with some delay in regards to browser closing the WS - by that time there is already a call to await stream_output which never returns.

for firefox i found a workaround, which is to increase 'network.websocket.timeout.ping.response' from 10 to 100 (need to go to 'about:config' url)

ElishaKay commented 4 months ago

Good eye @barsuna

It seems that by the time the WebSocket disconnect server method is run, the game is already done.

Based on this top-ranking Stack overflow ticket, they recommend keeping a steady ping pong game going between the client and server every 3 seconds.

The challenge with that is that it seems the client-server thread is locked while a report is being generated. Meaning, server doesn't have capacity to respond with a pong while its generating a report.

One potential solution:

client should only open websocket the moment user requests a report
server should keep pinging the client every 3 seconds after being instructed to generate a report (and client should keep ponging back)

barsuna commented 4 months ago

thanks @ElishaKay, indeed the timeout happens during to busy time on the server side (generally during subtopic generation for me). The computer itself is nowhere near overloaded (cpu/mem/io -wise) - i tend to think with asyncio the parts servicing pings etc may get enough time in critical moments (though its pure speculation)

I think WS is used for logging and allows the user to follow what is going on, so opening just at the report will deprive the user of extra verbosity, unless you mean reopening? I'd vote for exposing timing controls (in my limited understanding of WS the timeout is negotiated) - we can allow the user to increase it if default is not stable.

ElishaKay commented 3 months ago

Better idea: open the websocket connection the moment the report task begins (I.e user submits query) & close it the moment report completes

barsuna commented 3 months ago

idk, there is still a risk of timeout (though a lesser one perhaps?), i wonder if there are ways to control the length of the timeout on the gpt-researcher side. Or maybe a way to reopen the WS on closure?

ElishaKay commented 3 months ago

This commit should fix the websocket errors

The logic is: new websocket opens when the first query is submitted & re-opens automatically if the websocket is in a closed state when queries are submitted after that

@assafelovic - safe to close this issue

assafelovic / gpt-researcher

Graceful handling of websocket closure #519