Pylons / waitress

Waitress - A WSGI server for Python 3
https://docs.pylonsproject.org/projects/waitress/en/latest/
Other
1.44k stars 164 forks source link

100% CPU on all instances when using fast-listen on Plone #396

Closed cillianderoiste closed 1 year ago

cillianderoiste commented 1 year ago

See also: https://github.com/plone/plone.recipe.zope2instance/issues/188

http-fast-listen is enabled by default on Plone and uses wasyncore dispatcher: https://github.com/plone/plone.recipe.zope2instance/blob/8b8ca44c89479c00edb73e2e0d74074f64982718/src/plone/recipe/zope2instance/ctl.py#L929

Under certain conditions (e.g. starved CPU) starting multiple instances at the same time will trigger a condition which causes all instances to use all the available CPU. Disabling http-fast-listen or slowly starting each instance and waiting for the CPU usage to go down before starting the next instance, avoids the issue.

We're currently using the latest waitress release (2.1.2) and I was able to reproduce the issue on version 2.0.0 but not on 1.4.4.

digitalresistor commented 1 year ago

waitress.wasyncore is not public API and should not be relied on. It has been and will continue to be changed for the benefit of waitress and waitress alone.

Second, have you interrupted the process and or done any debugging to figure out where it is in the stack trace? I am not sure what http-fast-listen is, or why it matters, or why it is causing waitress to hang. It looks like it's generating a bunch fo sockets that are passed to waitress. Waitress just hands those off to the appropriate waitress server type:

https://github.com/Pylons/waitress/blob/main/src/waitress/server.py#L91-L120

digitalresistor commented 1 year ago

The code linked is doing stuff that makes little sense.

It's creating a dispatcher() which owns the socket, but that doesn't live very long. The socket number gets turned into a string (at this point it's a dangling file descriptor), then it is passed back to serve through global_conf where it is turned back into a socket that is then passed to waitress. The socket is dangling inside the dispatcher() which gets discarded at the end of the loop.

Second, it's calling .readable() on the dispatcher, but that function just returns True as it is only ever used for select/poll internally.

That code should get re-written to create sockets directly and pass them directly to waitress. At this point I am not considering this a waitress bug until you are able to reproduce it with simpler code that passes a list of sockets to waitress.

At this point I am not considering this a waitress bug until further evidence appears.

I am aware of several deployments using systems activated sockets with waitress that use socket passing to waitress without any issues, I would have expected screaming by now.

cillianderoiste commented 1 year ago

Sorry for leaving this open and thanks for your advice @bertjwregeer, it's greatly appreciated.