abourget / gevent-socketio

Official repository for gevent-socketio
http://readthedocs.org/docs/gevent-socketio/en/latest/
BSD 3-Clause "New" or "Revised" License
1.21k stars 331 forks source link

Graceful shutdown always times out under gunicorn #147

Open stevesw opened 11 years ago

stevesw commented 11 years ago

When gunicorn receives SIGQUIT it will initiate a graceful shutdown of its worker processes, wait some period of time (as determined by the graceful_timeout setting) to allow the workers to finish existing requests, and then terminates the workers.

If there are no workers processing requests, it should basically finish immediately. But when running the socketio.sgunicorn.GeventSocketIOWorker worker, it always waits until graceful_timeout has elapsed before terminating.

I think this is caused by a few differences in the implementation of GeventSocketIOWorker and it's parent class.

Here's the parent gunicorn.workers.ggevent.GeventWorker:

try:
    # Stop accepting requests
    [server.stop_accepting() for server in servers]

    # Handle current requests until graceful_timeout
    ts = time.time()
    while time.time() - ts <= self.cfg.graceful_timeout:
        accepting = 0
        for server in servers:
            if server.pool.free_count() != server.pool.size:
                accepting += 1

        # if no server is accepting a connection, we can exit
        if not accepting:
            return

        self.notify()
        gevent.sleep(1.0)

    # Force kill all active the handlers
    self.log.warning("Worker graceful timeout (pid:%s)" % self.pid)
    [server.stop(timeout=1) for server in servers]
except:
    pass

And here is socketio.gunicorn.GeventSocketIOWorker:

try:
    # Stop accepting requests
    [server.stop_accepting() for server in servers]

    # Handle current requests until graceful_timeout
    ts = time.time()
    while time.time() - ts <= self.cfg.graceful_timeout:
        accepting = 0
        for server in servers:
            if server.pool.free_count() == server.pool.size: # A
                accepting += 1

        if not accepting:
            return

        self.notify()
        gevent.sleep(1.0)

    # Force kill all active the handlers
    self.log.warning("Worker graceful timeout (pid:%s)" % self.pid)
    server.stop(timeout=1) # B
except:
    pass

Note the flipped comparison (marked A), and that it only kills one server instead of all of them (marked B). Fixing these two allows the workers to terminate immediately on SIGQUIT instead of always waiting for the graceful_timeout to elapse.

fabswt commented 8 months ago

experiencing the same issue (on latest: gevent 23.9.1), which ruins productivity for local development:

[2024-02-14 05:28:58 +0100] [1807] [INFO] Worker reloading: /var/www/html/gliglish/application/routes/some_route.py modified
[2024-02-14 05:29:28 +0100] [1807] [WARNING] Worker graceful timeout (pid:1807)
[2024-02-14 05:29:29 +0100] [1807] [INFO] Worker exiting (pid: 1807)
[2024-02-14 05:29:29 +0100] [1812] [INFO] Booting worker with pid: 1812

note the reload time: takes 30 seconds for the app to reload (because gunicorn is waiting on gevent to shutdown, if i got this right.)

workaround, until this is fixed: start gunicorn with --graceful-timeout 0. this way, you still get the warning ([WARNING] Worker graceful timeout) but don't have to wait for the app to reload.