Stop responding in a few days in wsgi mode

vooon commented 2 months ago

I've set to run OpenStack Placement API to be run by Granian. But sees that it stop's responding in a day or two. Unfortunately i do not see any logs, process looks alive, port are in netstat... But if i curl -v http://192.168.50.91:18778 it just hangs.

I see the same problem for other components as well, Heat API, Barbican... But Placement is very simple compared to others, it do not use eventlets or rabbitmq. Just WebOb and SQLAlchemy. So it probably easier to debug.

I suppose that the problem somewhere in WSGI handling part, as Skyline APIserver (which is written on FastAPI, and so deployed in ASGI mode) working without a problem for months.

gi0baro commented 2 months ago

@vooon can you also provide the full granian parameters/config you're using?

vooon commented 2 months ago

@gi0baro it's similar to other services, since use same template, just app factory is placement.wsgi:init_application:

/usr/bin/granian /etc/granian/openstack_placement_api.py:application \
    --host 192.168.50.91 \
    --port 18778 \
    --interface wsgi \
    --workers 2 \
    --threads 4 \
    --log-level debug

gi0baro commented 2 months ago

@gi0baro it's similar to other services, since use same template, just app factory is placement.wsgi:init_application:
/usr/bin/granian /etc/granian/openstack_placement_api.py:application \
  --host 192.168.50.91 \
  --port 18778 \
  --interface wsgi \
  --workers 2 \
  --threads 4 \
  --log-level debug

My guess is that something is blocking the Python threads and thus Granian runs out of working threads to process requests (it could also be the Rust runtime gets blocked, but I would expect connection refused/timeouts in that case).

I'd suggest to configure --backpressure, as with your configuration you can end up with 512 threads per workers interacting with Python code, I guess that just too much. Also, as per documentation, you won't benefit at all from --threads 4, I would just remove that. In the end, I would change your run command with something like this (where N is the maximum Python concurrency you expect):

/usr/bin/granian /etc/granian/openstack_placement_api.py:application \
    --host 192.168.50.91 \
    --port 18778 \
    --interface wsgi \
    --workers 2 \
    --backpressure N \
    --log-level debug

gi0baro commented 2 months ago

Closing this as stale

emmett-framework / granian

Stop responding in a few days in wsgi mode #383