Pylons / waitress

Waitress - A WSGI server for Python 3
https://docs.pylonsproject.org/projects/waitress/en/latest/
Other
1.44k stars 164 forks source link

Closing Flask-SSE pending connections cause Waitress Task queue error #381

Closed YuqiaoS closed 2 years ago

YuqiaoS commented 2 years ago

Waitress 2.1.1. Flask-SSE 1.0.0 Flask 1.1.4

After looking at the source code of Flask-SSE and searching for Flask response stream generator, I luckily found that this issue was reported before, and resolved...partially? https://github.com/Pylons/waitress/issues/236

Here's how to reproduce this issue.

  1. Frontend needs to initiate SSE with new EventSource()
  2. Backend registers the Flask-SSE blueprint without sending out any msg and start Waitress app with a few threads e.g. > 3
  3. Go to browser and refresh the page a few times and you'll probably see server logs "Task Queue depth ...." warnings

However.. if you start sending some msg from the backend after making connection, then the backend correctly logs "Client disconnected.." Open the dev tools, and you'll see in network requests that what happens is that XHR status is pending until msgs are sent out and then turns OK and if you close the connection afterwards, the status becomes the red cancelled. So my guess is that the pending connections are not properly terminated by Waitress when client connections close.

Could this issue be fixed? I think Waitress should dispose incomplete connections as well? A work around right now would be to send out heartbeat messages so that the connection completes with ok status, instead of pending.

mmerickel commented 2 years ago

Just to level-set the primary issue here is not the socket cleanup but rather the thread being consumed. Waitress could cleanup the socket and might already be - but the queue depth warning is about the thread resource.

Focusing on the thread, waitress is pretty helpless here because the app is in control and waitress cannot wake it up. The app itself would need to be yielding empty strings to periodically poll the status and then shutdown. This is not waitress’ fault entirely - the wsgi spec allows the app total control until it returns and it is allowed to continue doing work even after the socket is closed.

I’d suggest looking at how the SSE library could be adjusted to cleanup more nicely by doing something like yielding, or waitress also added a different feature recently that could be of interest which is an environ callable that can return the status of the socket. I don’t recall the name of it while typing from my phone.

digitalresistor commented 2 years ago

As @mmerickel mentioned this is due to the way that WSGI works, and is not something that can be fixed in Waitress. Once a thread is inside user code there is no way to wake it up or cancel the thread.

Waitress does not clean up the socket until it knows it has to be cleaned up (when the channel is shut down). While waiting for a response from the WSGI application the socket is not select()'ed on, unless you configure channel_request_lookahead to be 1 or larger.

The application can opt-in to checking if the remote client has disappeared, and have waitress continue to select() on the socket, with the knowledge that it will look-ahead a max number of requests (not really an issue when using SSE since the remote client shouldn't be sending multiple requests while waiting for SSE events to come back from the server).

Here is an example of using that feature: waitress.client_disconnected inside your app.

import logging
import time

log = logging.getLogger(__name__)

def application(environ, start_response):
    check = environ["waitress.client_disconnected"]

    for i in range(10):
        # do some computation
        log.debug("Starting computation for %d", i)
        time.sleep(2)
        log.debug("Completed computation for %d", i)

        if check():
            log.debug("Remote client went away, processed %d items", i)
            break

    start_response(
        "200 OK",
        [
            ("Content-Type", "application/octet-stream"),
        ],
    )

    return [b"work completed"]

if __name__ == "__main__":
    import waitress

    logging.basicConfig(
        format="%(asctime)-15s %(levelname)-8s %(name)s %(message)s",
        level=logging.DEBUG,
    )

    waitress.serve(application, channel_request_lookahead=5)
YuqiaoS commented 2 years ago

Ok so the pending/connected connection termination behavior (which you guys seem to have missed) I cited previously is not so much due to backend but the frontend!!

I'm switching web app pages and on component unmount, I call eventSource.close(). And here's what's interesting... EventSource connection has a readyState prop similar to XHR: CONNECTING (0), OPEN (1), or CLOSED (2). For those pending connections, which are the ones that have not received response from the server, the status is probably still on 0, and when I call close(), it can't cuz it's still connecting...and lo and behold this is exactly what it says in the docs..kinda.

The close() method of the EventSource interface closes the connection, if one is made, and sets the EventSource.readyState attribute to 2 (closed).

JS

onClick = {
let e = new EventSource('/stream'); 
callUrl('/send-me-something')
}

another btn onClick = { e.close() }

Flask

@app.route('send-me-something')
...
sse.publish('yo sup', 'greeting')
...

Start clicking and now you'll see "Client disconnected while serving..." warnings and if you do it fast, you'll sometimes see "Task queue warning" once a few times. But the server doesn't hang. And whether you simply refresh the page or with close with scripts, the result seems to be the same in that pending connections can't be closed by the client and cause the hang.

So in a way this is possibly the browser's fault instead? How should the client terminate an opened connection in CONNECTING state properly.

digitalresistor commented 2 years ago

The Flask SSE app is called by Waitress, even if the browser sends a message in that same socket to close the connection, or closes the socket on the remote end (no matter how that socket is closed, user pulled the cable, user closed their browser, user hit Ctrl + C in curl) waitress can not notify the application that is running about this in any way. That thread will forever be hanging in user code and waitress can't interrupt it.

For example, if you have this:

import logging
import time

log = logging.getLogger(__name__)

def application(environ, start_response):
    while True:
       echo "Looping"
       time.sleep(2)

    start_response(
        "200 OK",
        [],
    )

    return [b"never reached"]

if __name__ == "__main__":
    import waitress

    logging.basicConfig(
        format="%(asctime)-15s %(levelname)-8s %(name)s %(message)s",
        level=logging.DEBUG,
    )

    waitress.serve(application)

And you spawn this with 4 threads, even if the remote client goes away (and disconnects), the WSGI threads will continue to spin forever since there is no response and there is nothing waitress can do about it since it is inside user code.

Same exists here:

https://github.com/singingwolfboy/flask-sse/blob/main/flask_sse.py#L159-L167

If Flask SSE is waiting for new events and those events never arrive, it won't ever yield from its iterator and thus it won't ever return to waitress. It's stuck waiting forever in Flask SSE code. This is standard WSGI behavior.

In your case when you open up a new tab/refresh the page/whatever a new SSE connection is created, and a new event is emitted somewhere that forces the WSGI app thread that is stuck waiting for new messages to yield which then ends up back in Waitress where Waitress can then know that the remote closed the connection.

YuqiaoS commented 2 years ago

Ok so I think I can say fairly confidently after some further testing, this is true

pending connections can't be closed by the client and cause the hang.

You'll see this better if you actually try to implement this. So you are right that new msgs will wake up old ones, but the catch is that only the ones that have been connected before (the ones that have received msgs from server). Verify this by creating new EventSource's. and close them without calling 'send-me-something' url, after a few, (less than the available threads so that server doesn't hang), then you call 'send-me-something', you'll see nothing in the server logs. then you close again, then hit 'send-me-something', now you'll see the 'client disconnected' msg. So inferring from this, only the ones that have been actually connected can be woken up and Waitress proceeds raise the ClientDisconnected exception..

mmerickel commented 2 years ago

@YuqiaoS simply put this isn't something that waitress can fix for you. Flask-SSE should be updated to yield empty messages to the client periodically at which point waitress can raise the ClientDisconnected exception in the thread to tell it to clean up. However as it's currently implemented it will have this problem on any threaded wsgi server (not just waitress).

YuqiaoS commented 2 years ago

So right now the work-around is to ask the server to send something every time I init an EventSource. Still the thing I'd like to point out is that there's a distinction between the never connected ones and the .actually connected ones. The former ones seem to linger on and while the latter ones can be woken up later. So the issue is really that there's a difference in how Waitress or Flask handles these connections. I do not possess enough server knowledge to infer the cause for this, but I do think there's some internal mechanisms that causes the server to not be able to return to these lingering threads.

You are right that the actual fix would be to have the threads check and terminate, or else we would always have the lingering threads that don't get cleaned up till the next time SSE has messages sent out. (but this may possibly be done on the application level code by checking your Environ for example).

mmerickel commented 2 years ago

You seem to be debugging all of this by looking at the event source object state in the browser which may not have even established a connection to the server. Have you checked that this is true? That a socket is open and waitress is not closing it in the scenario that no messages have been sent yet?

I have no way to reproduce this as I don’t use any of the referenced libraries and it would take write some time to whip up an environment similar to your example steps from scratch.

YuqiaoS commented 2 years ago

Yes, that's what I believe is what I've observed. You can see the status of the event source connection (pending, ok, canceled, failed). Take a look at the Flask-SSE 's blueprint source code, and you'll see it's just a generator listening to Redis and yields every time there's a msg, much like the code given above without the checks. I can provide source code later if you'd like. They are very simple :)

YuqiaoS commented 2 years ago

@mmerickel Ok, I was mistaken, in my previous conclusion, though there is a distinction between the two. In my original js code I had this:

new EventSource()
// this connection happens first
callServer()

I thought about race conditions may happen but this kinda fixed the issue (partially as I verified later in my repo) but it led me to the wrong conclusion. so in effect callServer() flushes out previously unconnected connections before new EventSource connection is made. But one call sometimes isn't enough. In the case of connected ones, it actually requires 3 more calls to terminate the connection. You can see this in my example repo here.

So yea this means the thread termination mostly depends on on how app libs yield stream responses, though I wonder why it required so many more server calls for the app return back to Waitress :)