100% cpu in mainthread due to not closing properly? (channel.connected == False)

djay commented 1 year ago

Following on from debugging in this issue - https://github.com/collective/haufe.requestmonitoring/issues/15

What we see is waitress switching into 100% CPU and staying there. It is happening in production randomly (within a week) and we haven't traced it back to a certain request).

Using a sampling profiler on waitress with 2 threads (in prod) we identified the thread using the CPU as the mainthread (top -H) and this is the profile. Note that since this is prod there are other requests so not all activity is related to the looping causing this bug.

 Austin  TUI   Wall Time Profile                                                                                             CPU  99% ▇█▇█▇▇▇▇   MEM 263M ████████    5/5
   _________   Command /app/bin/python3.9 /app/parts/instance/bin/interpreter /app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py /app/parts/instance/etc/wsgi.ini
   ⎝__⎠ ⎝__⎠   Python 3.9.0     PID 3351466     PID:TID 3351466:11          
               Samples 1451365  ⏲️   3'16"       Threshold 0%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                                                                     
  00"    3'16"     0.0%  100.0%  └─ <module> (/app/parts/instance/bin/interpreter:326)                                                                                       ▒
  00"    3'16"     0.0%  100.0%     └─ <module> (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:255)                                                                    ▒
  00"    3'16"     0.0%  100.0%        └─ main (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:251)                                                                     ▒
  00"    3'16"     0.0%  100.0%           └─ run (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:217)                                                                   │
  00"    3'16"     0.0%  100.0%              └─ serve (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:203)                                                              │
  00"    3'16"     0.0%  100.0%                 └─ serve (/app/eggs/plone.recipe.zope2instance-6.11.0-py3.9.egg/plone/recipe/zope2instance/ctl.py:942)                       │
  00"    3'16"     0.0%  100.0%                    └─ serve_paste (/app/eggs/plone.recipe.zope2instance-6.11.0-py3.9.egg/plone/recipe/zope2instance/ctl.py:917)              │
  00"    3'16"     0.0%  100.0%                       └─ serve (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/__init__.py:19)                                                  │
  00"    3'16"     0.0%  100.0%                          └─ run (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/server.py:322)                                                  │
  05"    3'16"     2.5%   99.9%                             ├─ loop (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:245)                                           │
  36"     44"     18.3%   22.4%                             │  ├─ poll (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:158)                                        │
  05"     05"      2.4%    2.4%                             │  │  ├─ readable (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/server.py:290)                                    │
  01"     02"      0.4%    0.9%                             │  │  ├─ write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:117)                                    │
  01"     01"      0.4%    0.5%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:517)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  ├─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:95)                          │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:514)                    │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:515)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ poll (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:150)                                  │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  └─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:509)                    │
  00"     00"      0.0%    0.0%                             │  │  │     └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.1%                             │  │  ├─ write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:113)                                    │
  00"     00"      0.1%    0.1%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:517)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  └─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:509)                    │
  00"     00"      0.1%    0.1%                             │  │  ├─ readable (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:154)

from profiling it looks like channel is writable but the channel.connected == False. So then it goes into a loop without writing or closing since it never actually does anything to the socket. https://github.com/Pylons/waitress/blob/main/src/waitress/channel.py#L98

EDIT: My suspicion would be that what started this was a client that shutdown (half) very quickly after a connect and this happened before the dispatcher finished being setup. This causes getpeername to fail with EINVAL and connected = False.

https://github.com/Pylons/waitress/blob/4f6789b035610e0552738cdc4b35ca809a592d48/src/waitress/wasyncore.py#L310

            try:
                self.addr = sock.getpeername()
            except OSError as err:
                if err.args[0] in (ENOTCONN, EINVAL):
                    # To handle the case where we got an unconnected
                    # socket.
                    self.connected = False
                else:
                    # The socket is broken in some unknown way, alert
                    # the user and remove it from the map (to prevent
                    # polling of broken sockets).
                    self.del_channel(map)
                    raise

Could be same issue as https://github.com/Pylons/waitress/issues/411 but hard to tell.

One fix in #419 but could be better ways?

djay commented 1 year ago

@d-maurer

The error is where self.connected was set tu False. There, it should have been ensured that the corresponding "fileno" is removed von socket_map and that it will not be put there again (as long as self.connected remains False).

Something exceptional must have brought waitress into this state (otherwise, we would have lots of 100 % CPU usage reports). I assume that some bad client has used the system call shutdown to close only part of the socket connection and that waitressdoes not anticipate something like that.

Waitress does seem to properly close if shutdown is received (empty data). see https://github.com/Pylons/waitress/blob/main/src/waitress/wasyncore.py#L449

So have to keep looking for a way connected can be false but it can still be trying to write. Yes it is most likely bad actors. We get hit by this a lot in our line of business.

d-maurer commented 1 year ago

Dylan Jay wrote at 2023-9-11 07:56 -0700:

@d-maurer

The error is where self.connected was set tu False. There, it should have been ensured that the corresponding "fileno" is removed von socket_map and that it will not be put there again (as long as self.connected remains False).

Something exceptional must have brought waitress into this state (otherwise, we would have lots of 100 % CPU usage reports). I assume that some bad client has used the system call shutdown to close only part of the socket connection and that waitressdoes not anticipate something like that.

Waitress does seem to properly close if shutdown is received (empty data). see https://github.com/Pylons/waitress/blob/main/src/waitress/wasyncore.py#L449

I would expect an empty data read, if the sending part of the connection is shut down. However, the client might shut down the receiving part; for me, this does not suggest an empty data read.

d-maurer commented 1 year ago

kDylan Jay wrote at 2023-9-11 07:56 -0700:

@d-maurer

The error is where self.connected was set tu False. There, it should have been ensured that the corresponding "fileno" is removed von socket_map and that it will not be put there again (as long as self.connected remains False).

Something exceptional must have brought waitress into this state (otherwise, we would have lots of 100 % CPU usage reports). I assume that some bad client has used the system call shutdown to close only part of the socket connection and that waitressdoes not anticipate something like that.

Waitress does seem to properly close if shutdown is received (empty data). see https://github.com/Pylons/waitress/blob/main/src/waitress/wasyncore.py#L449

I have an idea how to fix the inconsistency without understanding how it came into being: The handle_write_event is (likely) only called by the loop. Thus, if it is called and the connection is closed, it can detect an inconsistency (a closed connection should not get events). It can resolve the inconsistency by deregistration with the loop.

djay commented 1 year ago

@d-maurer yes it could work to insert a self.del_channel(self) in https://github.com/Pylons/waitress/blob/84360df4c5b4da7c72439bdbe919a84b3c619075/src/waitress/channel.py#L97

djay commented 1 year ago

@d-mauer I created a PR and it passes the current tests and they hit that line but hard to know how to make a test for this scenario...

https://github.com/Pylons/waitress/pull/419

d-maurer commented 1 year ago

Dylan Jay wrote at 2023-9-11 10:10 -0700:

@d-mauer I created a PR and it passes the current tests and they hit that line but hard to know how to make a test for this scenario...

https://github.com/Pylons/waitress/pull/419

Some test should not be that difficult: you register a write handler for a connection and then set connected to False. In the next loop run, there will be a write event and you can check that it does not end in an infinite loop.

Of course, the test only verifies that a not connected does not lead to an infinite loop. It does not try to set up a realistic case for setting connected to False.

djay commented 1 year ago

@mcdonc another solution instead of #419 might be below. is that preferable?

def poll(timeout=0.0, map=None):
    if map is None:  # pragma: no cover
        map = socket_map
    if map:
        r = []
        w = []
        e = []
        for fd, obj in list(map.items()):  # list() call FBO py3
            # prevent getting into a loop for sockets disconnected but not properly closed.
            if obj.check_client_disconnected():
                obj.del_channel()
                continue

perhaps you have a better idea on how it could have got into this knot and the best way to test?

djay commented 1 year ago

@mcdonc one code path that could perhaps lead to this is

https://github.com/Pylons/waitress/blob/4f6789b035610e0552738cdc4b35ca809a592d48/src/waitress/wasyncore.py#L313

since connecting == False also there doesn't seem to be a way for it to write data out or close?

EDIT: one scenario could be the client half disconnected very quickly before the dispatcher was setup so getpeername fails? but somehow the socket still can be written to?

djay commented 1 year ago

Looks like it is possible a connection thats been broken before getpeername to then no have any error in select. in the case where there is nothing to read since that will result in a close. https://stackoverflow.com/questions/13257047/python-select-does-not-catch-broken-socket. not sure how it has something write in that case. maybe shutdown for readonly very quickly?

EDIT: https://man7.org/linux/man-pages/man3/getpeername.3p.html

"EINVAL The socket has been shut down." <- so looks like shutdown for read very quickly seems possible to create this tight loop.

djay commented 1 year ago

or somethow the getpeername is invalid and that results in a oserror. and there is nothing to read but something to write. but I'm not sure if that results in the EINVAL or not.

d-maurer commented 1 year ago

Dylan Jay wrote at 2023-9-11 18:35 -0700:

@mcdonc another solution instead of #419 might be below. is that preferable?

def poll(timeout=0.0, map=None):
   if map is None:  # pragma: no cover
       map = socket_map
   if map:
       r = []
       w = []
       e = []
       for fd, obj in list(map.items()):  # list() call FBO py3
           # prevent getting into a loop for sockets disconnected but not properly closed.
           if obj.check_client_disconnected():
               obj.del_channel()
               continue

Not sure. It, too, uses del_channel suggesting "delete channel". While "delete channel" likely removes the channel from the socket map, it likely does more than that -- and maybe things not best placed into poll.

Another point: the map may contain objects without del_channel (e.g. non channels) and check_client_disconnected. In those cases, the code above would raise an exception and bring your service down.

Currently, I favor the following reasoning:

You have detected that a write event for a "non connected" connection leads to a busy loop. We must do something about it (either prevent the case from happening or clean up locally).

For the local cleanup, I proposed to unregister the "write event handler". You have translated this into a call to del_channel, likely because the loop does not support "write event handler"s. The loop API only supports the registration of objects (with fileno, readable and writable, ... methods); those objects' writable method indicates whether the object is interested in write event notifications.

I propose to implement "unregister the write event handler" not by a del_channel call but by a modification of writable: ensure it returns false for a non connected connection.

In the long run waitress should likely change its "connected" concept. HTTP is based on TCP which implements bidirectional communication channels. The shutdown system call allows applications to shut down individual directions. This is not compatible with a boolean "connected", instead we have 4 connection states: (fully) disconnected, read connected, write connected and (fully) connected.

I do not know whether important HTTP clients exist which use shutdown. Some might want to signal "no more input" by shutting down the outgoing communication channel, e.g. for a long download.

perhaps you have a better idea on how it could have got into this knot

My primary thought: It must be something exceptional which caused the situation. Otherwise, lots of 100 % CPU usage reports should have been seen.

Initially, I thought of a use of shutdown to (only) partially shut down the socket. Likely, such a use is unanticipated and may reveal bugs.

Meanwhile, I can think of other causes: Maybe, when the connection is closed while the response is beeing streamed back, an inconsistency can creep in. The "async" world should be quite robust against such types of problems (because all IO logic is serialized) but the system is connected to a network and important network events (e.g. the closing of a communication channel) can happen at any time and cause unanticipated exceptions changing the normal control flow.

and the best way to test?

We have here the case that we cannot reproduce the real situation (because we do not know how the inconsistency was introduced). We therefore only implement a workaround.

My favorite workaround (ensure "writable" returns "False" when "not connected") is so simple that no test is necessary to convince us that the busy loop is avoided.

The workaround may not solve all problems. For example, it may keep a socket in use which should in principle have been closed.

d-maurer commented 1 year ago

Dylan Jay wrote at 2023-9-11 21:59 -0700:

or somethow the getpeername is invalid and that results in a oserror. and there is nothing to read but something to write. but I'm not sure if that results in the EINVAL or not.

I do not think that this goes into the right direction: waitress is an HTTP server and (unlike e.g. a telnet server) it produces output only after it has received input. Thus, I expect that the problematic connection has once been in a connected state (to read something and then to produce output).

djay commented 1 year ago

@d-maurer I'm fairly sure I have one solid explanation how this could occur.

Outlined in this test - https://github.com/Pylons/waitress/pull/419/files#diff-5938662f28fcbb376792258701d0b6c21ec8a1232dada6ad2ca0ea97d4043d96R775

NOTE: I haven't worked out a way to detect the looping in a test yet. So the assert at the end is not correct.

It is as you say. There is a shutdown of the read only but this is a race condition. it has to happen before the dispatcher is created so right after the connect. I've confirmed this results in an getpeername returning OSError EINVAL and thus connected = False and the select still thinks it can write so the loop will be inifinite. or maybe until the bad actor properly closes the connection. not sure on that one.

In the long run waitress should likely change its "connected" concept. HTTP is based on TCP which implements bidirectional communication channels. The shutdown system call allows applications to shut down individual directions. This is not compatible with a boolean "connected", instead we have 4 connection states: (fully) disconnected, read connected, write connected and (fully) connected.

true. but if I'm right on the cause of this this, the socket would never have connected=False with most shutdowns. Only when it happens too quickly. That flag is mostly used to indicate not yet connected or in the process of closing.

My favorite workaround (ensure "writable" returns "False" when "not connected") is so simple that no test is necessary to convince us that the busy loop is avoided.

yes that will also work. I'll switch it to that. There is a system to remove inactive sockets so I guess that would get them closed eventually. I'm not really sure the pros and cons of having sockets left open vs the consequences of just closing them for this case (I tried this. it also worked in terms of the tests).

djay commented 1 year ago

@d-maurer I pushed new code that uses writable instead.

d-maurer commented 1 year ago

Dylan Jay wrote at 2023-9-12 03:59 -0700:

... It is as you say. There is a shutdown of the read only but this is a race condition. it has to happen before the dispatcher is created so right after the connect. I've confirmed this results in an getpeername returning OSError EINVAL and thus connected = False and the select still thinks it can write so the loop will be inifinite.

I do not know waitress implementation details BUT in general, write notications are called for only AFTER output has been generated (i.e. writable will only return True once data to be written has been generated).

As explained earlier, an HTTP server first reads data from a connection before it writes to the connection.

If you are right with your assumption above, then reading has been possible (despite a "not connected") and output was generated based in this input.

or maybe until the bad actor properly closes the connection. not sure on that one.

The connection's writeable must also return True (otherwise, the corresponding fd will not be included in writefs). Usually, this would happen if it is known that there is data to be output.

djay commented 12 months ago

@d-maurer maybe a core contributor can step in and advise the best solution and test. @digitalresistor @kgaughan ?

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-12 20:02 -0700:

@d-maurer maybe a core contributor can step in and advise the best solution and test. @digitalresistor @kgaughan ?

I had a closer look at the code and I think I found a realistic scenario to enter the busy loop state: If HTTPChannel.read reads empty data, it sets connected to False; if there is pending output at that time we are in the busy loop state.

We can force HTTPChannel.read to read empty data by letting the HTTP client shutdown its sending direction. Once all data has been read by the receiving site, its next recv will return empty data. A normal close (rather than shutdown) might have a similar effect.

The hypothesis can be checked in the following way: Design an HTTP service to produce sufficient output to saturate the output channel. Design an HTTP client: it sends a request to the service (but does not read the response), waits sufficiently long such that the service has produced its output, then shuts down the writing direction of its HTTP connection (maybe just closes its HTTP connection). Check whether this has brings waitress into the busy loop state.

djay commented 12 months ago

@d-maurer that was my initial thought but as I pointed out in https://github.com/Pylons/waitress/issues/418#issuecomment-1714057512 recv in wasynccore will do handle_close on getting empty data and take it out of the map so I couldn't see any way for no bytes being sent to cause this loop.

    def recv(self, buffer_size):
        try:
            data = self.socket.recv(buffer_size)
            if not data:
                # a closed connection is indicated by signaling
                # a read condition, and having recv() return 0.
                self.handle_close()
                return b""
            else:
                return data
        except OSError as why:
            # winsock sometimes raises ENOTCONN
            if why.args[0] in _DISCONNECTED:
                self.handle_close()
                return b""
            else:
                raise

djay commented 12 months ago

Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data. So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-13 00:07 -0700:

@d-maurer that was my initial thought but as I pointed out in https://github.com/Pylons/waitress/issues/418#issuecomment-1714057512 recv in wasynccore will do handle_close on getting empty data and take it out of the map as I couldn't see any way for no bytes being sent to cause this loop.

You are right! I missed (had forgotten) that.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-13 00:11 -0700:

Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data.

A select will almost always report a possible write. (For a "normal" socket) the only exception is that the write buffer is satuarated. That's why the writeable must return False unless there is data to write (or the handle_write will be able to clean up the state).

So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that.

Only, if waitress defines its writable in a non standard way: typically, writable would only return True if output was pending.

In channel.py writable is defined as:

        return self.total_outbufs_len or self.will_close or self.close_when_flushed

Thus, it is not completely standard. However, as far as I could see, will_close and close_when_flushed can only be set during request processing, i.e. after input has been received.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-13 00:07 -0700:

@d-maurer that was my initial thought but as I pointed out in https://github.com/Pylons/waitress/issues/418#issuecomment-1714057512 recv in wasynccore will do handle_close on getting empty data and take it out of the map as I couldn't see any way for no bytes being sent to cause this loop.

I have meanwhile read the Python socket HOWTO (--> "https://docs.python.org/3/howto/sockets.html#socket-howto"). It recommends (in the "Disconnecting" section) to operate in an HTTP-like exchange: send the request and then use shutdown(1) to indicate "I (the client) will produce no more output but am still ready for input".

The behavior of waitress you point out above (close as soon as there is no more input) will not play well with this recommendation.

djay commented 12 months ago

@d-maurer that would be a different bug in waitress.

My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :(

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-13 04:26 -0700:

... My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :(

Would you share the version of waitress you are observing the behavior?

d-maurer commented 12 months ago

Dieter Maurer wrote at 2023-9-13 09:57 +0200:

Dylan Jay wrote at 2023-9-13 00:11 -0700:

Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data.

A select will almost always report a possible write. (For a "normal" socket) the only exception is that the write buffer is satuarated. That's why the writeable must return False unless there is data to write (or the handle_write will be able to clean up the state).

So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that.

Only, if waitress defines its writable in a non standard way: typically, writable would only return True if output was pending.

In channel.py writable is defined as:
       return self.total_outbufs_len or self.will_close or self.close_when_flushed
Thus, it is not completely standard. However, as far as I could see, will_close and close_when_flushed can only be set during request processing, i.e. after input has been received.

will_close can be set by server.BaseWSGIServer.maintenance, i.e. independent of a task/request. Thus, you might be right with your hypothesis: connected set to false in wasyncore.dispatcher.__init__ due to a connection race condition; later busy loop due to not connected and writable.

You could verify this as follows:

Add a sufficiently large sleep into wasyncore.dispatcher.__init__ before the getpeername call.

Open a connection to the server and immediately close it again.

The sleep should ensure that at the time of the getpeername call, the remote socket end is already closed (maybe, getpeername then fails with an exception and connected is set to False).

Wait sufficiently long (somewhat longer than cleanup_interval) to let maintenance set will_close.

If you are right, this will result in a busy loop.

d-maurer commented 12 months ago

Dieter Maurer wrote at 2023-9-13 14:26 +0200:

... will_close can be set by server.BaseWSGIServer.maintenance, i.e. independent of a task/request. Thus, you might be right with your hypothesis: connected set to false in wasyncore.dispatcher.__init__ due to a connection race condition; later busy loop due to not connected and writable.

You could verify this as follows:

Add a sufficiently large sleep into wasyncore.dispatcher.__init__ before the getpeername call.

Open a connection to the server and immediately close it again.

The sleep should ensure that at the time of the getpeername call, the remote socket end is already closed (maybe, getpeername then fails with an exception and connected is set to False).

Wait sufficiently long (somewhat longer than cleanup_interval) to let maintenance set will_close.

If you are right, this will result in a busy loop.

On my system (Linux, kernel 5.4.0), getpeername returns the peer address even after the socket's remote end has been closed. Verified via the following interactive code: server code:

from socket import socket, AF_INET, SOCK_STREAM

ss = socket(AF_INET, SOCK_STREAM)
ss.bind(("localhost", 10000))
ss.listen()
cs, addr = ss.accept()
# run the client code in a second interactive session
cs.getpeername()

client code:

from socket import socket, AF_INET, SOCK_STREAM
cs = socket(AF_INET, SOCK_STREAM)
cs.connect(("localhost", 10000))
cs.close()

d-maurer commented 12 months ago

"EINVAL The socket has been shut down." <- so looks like shutdown for read very quickly seems possible to create this tight loop.

Note that a socket has 2 ends. "The socket has been shut down" might refer to the local (not the remote) end.

digitalresistor commented 12 months ago

@d-maurer that would be a different bug in waitress.

My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :(

I would highly recommend that you don't run waitress bare on the internet. Best practice us to place it behind a load balancer of some sort.

There are other scenarios in which Waitress does not deal well with certain request patterns depending on what type of content your are serving with waitress (what your app generates, how it generates, how large those responses are). Waitress for example does not deal well with clients that read very slowly if the response is larger than the various buffers that it tries to use internally, thereby allowing a client to hold up an app thread directly (can't pop from the WSGI app when the buffer is full/high water mark is reached).

djay commented 12 months ago

@d-maurer that would be a different bug in waitress. My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :(

I would highly recommend that you don't run waitress bare on the internet. Best practice us to place it behind a load balancer of some sort.

@digitalresistor of course it's not bare on the internet. There are 4 reverse proxies in front of it. Yet this bug still happens.

djay commented 12 months ago

@d-maurer you are right. shutdown doesn't seem to make getpeername fail. I've created a test and I can't get getpeername to fail yet.

https://github.com/djay/waitress/blob/5130bec0501567c809397cc3d8003c0af11fb46f/tests/test_server.py#L338

djay commented 12 months ago

@d-maurer It's possible that macos getpeername works differently from linux because there seems to be many reports that getpeername will fail if the connection is broken. I have yet to run this on linux.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-13 21:45 -0700:

@d-maurer It's possible that macos getpeername works differently from linux because there seems to be many reports that getpeername will fail if the connection is broken. I have yet to run this on linux.

While I do not think that getpeername is the cause of the problems you observe (I expect that the getpeername information is set up at the time of the accept and is then static until the local end of the socket is shut down, i.e. independent of the remote end), I think the channel handling in wasyncore.dispatcher.__init__ could be improved: instead of:

    connected = True
    set_channel()
    ...
    if  ...
      connected = False
    ...

we should have:

    connected = True
    ...
    if ...
      connected = False
        if connected: set_channel()

I.e. we should register a channel only after we decided that the corresponding socket is functional.

djay commented 12 months ago

@d-maurer https://bugs.python.org/issue28447 https://stackoverflow.com/questions/53695564/is-there-any-way-to-getpeername-in-python-after-the-socket-disconnects

yes I agree that the code around getpeername in wasyncore.dispatcher.__init__ doesn't look right but it was put there for a reason and until we get someone who knows this code well, like @digitalresistor, then it's hard to know what the better change is.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-14 00:23 -0700:

@d-maurer https://bugs.python.org/issue28447 https://stackoverflow.com/questions/53695564/is-there-any-way-to-getpeername-in-python-after-the-socket-disconnects

I checked on Linux with kernel 5.15.0 (like before for kernel 5.4.0): shutting down the remote end of the socket, has not affected the getpeername on the local end.

It might be that a "keepalive" would change the behavior.

d-maurer commented 12 months ago

Dieter Maurer wrote at 2023-9-14 11:17 +0200:

Dylan Jay wrote at 2023-9-14 00:23 -0700:

@d-maurer https://bugs.python.org/issue28447 https://stackoverflow.com/questions/53695564/is-there-any-way-to-getpeername-in-python-after-the-socket-disconnects

I checked on Linux with kernel 5.15.0 (like before for kernel 5.4.0): shutting down the remote end of the socket, has not affected the getpeername on the local end.

It might be that a "keepalive" would change the behavior.

The keepalive configuration makes the difference. With it, the getpeername can fail with error "107: Transport endpoint is not connected".

djay commented 12 months ago

@d-maurer I added keepalived to the test and still couldn't get it show the bug.

I also found this - https://stackoverflow.com/questions/13257047/python-select-does-not-catch-broken-socket?rq=3. It's saying that if the connection was closed select would indicate a read and then waitress would do handle_close.... so not sure what leaves as the the cause of this problem.

Another thing I researched is if there is someway for getpeername to fail because of some parsing failure of an address. like ipv6 or something. But didn't seem likely. and even more so because in my case it was an internal connection from haproxy.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-14 02:38 -0700:

... I also found this - https://stackoverflow.com/questions/13257047/python-select-does-not-catch-broken-socket?rq=3. It's saying that if the connection was closed select would do a read and then waitress would do handle_close.... so not sure what leaves as the the cause of this problem.

readable will return False if will_close is set. This would ignore the read event.

Thus, we still have a potential race condition: the race looks as follows:

the main server accepts a new connection and adds it to the event loop; then the getpeername for it fails which unsets connected
in the next loop run, channel maintenance is started and sets will_close for the connection This is a connection state where read events are ignored, writable returns True and we get the busy select loop

Of course, 2. is extremely unlikely: it would mean that a newly created connection is almost immediately closed. The default channel_timeout is 2 min. With this value, 2. should be virtually impossible.

djay commented 12 months ago

@d-maurer

in the next loop run, channel maintenance is started and sets will_close for the connection This is a connection state where read events are ignored, writable returns True and we get the busy select loop

yes you are right. the maintenance relying on will_close doesn't seem right. Since it prevents read the only way it can close is if a write happens and connected =False prevents that. So that explains why maintenance never cleans up this connection. I guess the intention is to flush out any remaining output before close but if its really a stale connection, why bother? just close it.

But still the most likely cause is getpeername on init.

Of course, 2. is extremely unlikely: it would mean that a newly created connection is almost immediately closed. The default channel_timeout is 2 min. With this value, 2. should be virtually impossible.

yes I don't know how either. There could be some combination or keepalive or something that results in the server knowing it's diconnected quickly.

You can see in

https://github.com/djay/waitress/blob/ea3cb3a1d4b85bc9f9d3de459619824179f4f969/tests/test_server.py#L316

I've tried some keepalive options and none of them have got getpeername to fail

    def test_quick_shutdown(self):
        sockets = [socket.socket(socket.AF_INET, socket.SOCK_STREAM)]
        sockets[0].bind(("127.0.0.1", 8000))
        sockets[0].setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
        # sockets[0].setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 1)
        sockets[0].setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 1)
        sockets[0].setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
        sockets[0].listen()
        client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        client.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
        client.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 1)
        client.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)

        inst = self._makeWithSockets(_start=False, sockets=sockets)
        from waitress.channel import HTTPChannel
        class ShutdownChannel(HTTPChannel):
            def __init__(self, server, sock, addr, adj, map=None):
                client.shutdown(socket.SHUT_WR)
                # client.close()
                sleep(3)
                return HTTPChannel.__init__(self, server, sock, addr, adj, map)
        inst.channel_class = ShutdownChannel

        client.connect(("127.0.0.1", 8000))
        inst.handle_accept()

        channel = list(iter(inst._map.values()))[-1]
        self.assertEqual(channel.__class__, ShutdownChannel)
        # self.assertEqual(channel.socket.getpeername(), "")
        self.assertRaises(Exception, channel.socket.getpeername)
        self.assertFalse(channel.connected, "race condition means our socket is marked not connected")

        inst.task_dispatcher = DummyTaskDispatcher()
        selects = 0
        orig_select = select.select

        def counting_select(r, w, e, timeout):
            nonlocal selects 
            rr, wr, er = orig_select(r, w, e, timeout)
            if rr or wr or er:
                selects += 1
            return rr, wr, er

        select.select = counting_select

        # Modified server run
        inst.asyncore.loop(
            timeout=inst.adj.asyncore_loop_timeout,
            map=inst._map,
            use_poll=inst.adj.asyncore_use_poll,
            count=2
        )
        select.select = orig_select
        sockets[0].close()
        self.assertEqual(selects, 0, "ensure we aren't in a loop trying to write but can't")

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-14 19:38 -0700:

... You can see in

https://github.com/djay/waitress/blob/ea3cb3a1d4b85bc9f9d3de459619824179f4f969/tests/test_server.py#L316

I've tried some keepalive options and none of them have got getpeername to fail

I was able to let getpeername fail (with 2 interactive sessions emulating a server and a client, executing code I posted earlier with the (additional) keepalive configuration shown in the stack overflow question you referenced earlier; I assume the failure happened because I did not switch fast enought from the client to the getpeername call in the server). BUT a failing getpeername is only one precondition for your hypothesis.

I think we do not yet have a convincing scenario. Maybe, we should look elsewhere, too.

What your strace output has shown: We get consecutive selects with a connection "c" and not c.readable() and c.writable() and select reports write ready for "c" but no output is tried. Because no output is tried, the condition above does not change and we get a busy loop.

Up to now, we concentrated on not connected because this would effectively ignore the write_ready_event and therefore not try output. But we must also explain the not c.readable().

readable is defined as:

        return not (
            self.will_close
            or self.close_when_flushed
            or len(self.requests) > self.adj.channel_request_lookahead
            or self.total_outbufs_len
        )

The last 3 conditions can only be met once a request has been received,. will_close can be set either during request processing or by maintenance.

While theoretically possible, it is very unlikely that maintenance has set will_close fast enough not the see the "remote write direction shutdown" event (which would have closed "c"). I therefore think that a request has been received (and likely has even generated output). Theoretically this is possible:

a client connects, sends a request and shuts down
some magic (still not yet fully understood) configuration lets getpeername fail in the server's accept. This sets connected = False for the corresponding connection, effectively preventing output for it.
the server sees the request and schedules it; request processing happens to be fast enough to generate output before the next poll - preventing read processing for the connection

I looked at other potential causes preventing output.

output is prevented if total_outbuf_len < adjustments.send_bytes This is highly dangerous for send_bytes > 1, but the default is 1 and then we should have will_close or close_when_flushed, in the prevention case. i.e. the channel will be closed at the end of handle_write.
output is prevented, if the outbufs_lock cannot be acquired without waiting. Then some other thread should currently control the output buffers and release them eventually.
an exception prevents output in this case, will_close will be set and the connection closed at the end of handle_write.

Thus, no really convincing scenario here, too.

Maybe, the next step could be trying to understand why my (2 interactive sessions) test succeeds to get a failing getpeername but your test does not.

djay commented 12 months ago

@d-maurer

Maybe, the next step could be trying to understand why my (2 interactive sessions) test succeeds to get a failing getpeername but your test does not.

yes. perhaps try the test? How long are you waiting before running getpeername? maybe its different platform? I did try it on linux in docker but no different.

I had another look through the code to find any other way connected = False. Seems like only exceptions in handle_close or dispatcher init could result in this but I couldn't see anything that might fail in a way to leave the channel in place but with connected = False.

Maybe another way is to add logging to getpeername failing and put that into production to proof thats the code path that causes this, and exactly which kind of OSError.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-15 01:20 -0700:

@d-maurer

Maybe, the next step could be trying to understand why my (2 interactive sessions) test succeeds to get a failing getpeername but your test does not.

yes. perhaps try the test?

I did and can now reproduce the getpeername failure.

The problem with your original: it can take an arbitrary time before getpeername raises an error. My modification looks like:

        inst = self._makeWithSockets(_start=False, sockets=sockets)
        from threading import Event
        from time import time
        ev = Event()
        from waitress.channel import HTTPChannel
        class ShutdownChannel(HTTPChannel):
            def __init__(self, server, sock, addr, adj, map=None):
                client.shutdown(socket.SHUT_RDWR)
                client.close()
                # client.close()
                with open("/dev/tty", "w") as out:
                  while True:
                    try: sock.getpeername()
                    except OSError:
                        print("broken", time(), file=out)
                        break
                    else: print("not yet broken", time(), file=out); sleep(1)
                ev.set()
                return HTTPChannel.__init__(self, server, sock, addr, adj, map)
        inst.channel_class = ShutdownChannel

        client.connect(("127.0.0.1", 8000))
        inst.handle_accept()

        channel = list(iter(inst._map.values()))[-1]
        self.assertEqual(channel.__class__, ShutdownChannel)
        # self.assertEqual(channel.socket.getpeername(), "")
        ev.wait()
        self.assertRaises(Exception, channel.socket.getpeername)

i.e. the server loops until getpeername breaks and then informs the test.

A test run produces output like this:

../djay_waitress/tests/test_server.py not yet broken 1694767475.3831112
not yet broken 1694767476.3845906
not yet broken 1694767477.385782
...
not yet broken 1694767536.4731994
broken 1694767537.474676

In the run above, it took about 1 min before getpeername broke. But this time varies.

d-maurer commented 12 months ago

Dieter Maurer wrote at 2023-9-15 10:53 +0200:

... The problem with your original: it can take an arbitrary time before getpeername raises an error. ... A test run produces output like this:
../djay_waitress/tests/test_server.py not yet broken 1694767475.3831112
not yet broken 1694767476.3845906
not yet broken 1694767477.385782
...
not yet broken 1694767536.4731994
broken 1694767537.474676
In the run above, it took about 1 min before getpeername broke.

I modified the output logic a bit:

                with open("/dev/tty", "w") as out:
                  st = time()
                  while True:
                    try: sock.getpeername()
                    except OSError:
                        print("broken after seconds", time() - st, file=out)
                        break
                    else: sleep(1)
                ev.set()

I now see that it takes (Linux 5.4 kernel) about 63 s before getpeername breaks. Contrary to an earlier statement, the time seems quite constant.

The test then fails with:

       sockets[0].close()
>       self.assertEqual(selects, 0, "ensure we aren't in a loop trying to write but can't")
E       AssertionError: 1 != 0 : ensure we aren't in a loop trying to write but can't

digitalresistor commented 12 months ago

Does os.getpeername() fail from then on, as in it never recovers again even for other sockets/new connections?

The more I think about it, the more that I just think that if os.getpeername() fails due to not connected/invalid socket... we just close the socket/delete the channel and try no further processing. There's no reason why it should even attempt to read/write to the socket at that point or why it should get added to the map.

d-maurer commented 12 months ago

Delta Regeer wrote at 2023-9-15 11:56 -0700:

Does os.getpeername() fail from then on, as in it never recovers again?

I only tried a few times (and those all failed). But I would be highly surprised if the behavior changed again: the remote end is shut down/closed; therefore, nothing can bring the socket back in the connected state, an ENOTCONNECTED state should persist.

The more I think about it, the more that I just think that if os.getpeername() fails due to not connected/invalid socket... we just close the socket/delete the channel and try no further processing. There's no reason why it should even attempt to read/write to the socket at that point or why it should get added to the map.

+1. This is similar to what I proposed: call set_socket (i.e. register the socket) only after getpeername succeeded.

d-maurer commented 12 months ago

Dieter Maurer wrote at 2023-9-16 00:05 +0200:

Delta Regeer wrote at 2023-9-15 11:56 -0700:

Does os.getpeername() fail from then on, as in it never recovers again?

I only tried a few times (and those all failed). But I would be highly surprised if the behavior changed again: the remote end is shut down/closed; therefore, nothing can bring the socket back in the connected state, an ENOTCONNECTED state should persist.

The more I think about it, the more that I just think that if os.getpeername() fails due to not connected/invalid socket... we just close the socket/delete the channel and try no further processing. There's no reason why it should even attempt to read/write to the socket at that point or why it should get added to the map.

I have 2 use cases for a not connected socket to be in the socket map:

the main server socket (the one which calls listen/accept) It will never get connected A complete accept will be indicated by a ready to read event.
a "client socket" (one on which connect is called) Such a socket is not (yet) connected; its connectedness will be indicated by a "ready to write" (select) event.

Of course, this does not apply to a "server channel socket" (such as HTTPChannel). A server channel socket is returned by accept and then connected; once it gets disconnected, it will never again become connected. Therefore, a "server channel socket" (e.g. "HTTPChannel") should immediately close in its constructor if the inherited constructor has set connected to False.

djay commented 12 months ago

@d-maurer

I now see that it takes (Linux 5.4 kernel) about 63 s beforegetpeername breaks. Contrary to an earlier statement, the time seems quite constant.

yep. 70s makes the test fail on mine too. I only ever tried it to 60s. You also need keepalive enabled for both client and server.... or I did have the test failing and now it's not and I can't see what I changed... :(

But how can there realistically be a 65s delay between when an accept and getpeername? there are are only a few lines inbetween.

@digitalresistor

The more I think about it, the more that I just think that if os.getpeername() fails due to not connected/invalid socket... we just close the socket/delete the channel and try no further processing. There's no reason why it should even attempt to read/write to the socket at that point or why it should get added to the map.

yeah but why wasn't that always the case. I don't really understand why there seemed to be a deliberate case to not close the connection. The code comes from before asynccore was moved to waitress. Maybe the asynccore expects whoever is using it to close the connection if connected=False? and waitress doesn't do that if handle_read is not called

djay commented 12 months ago

@d-maurer

Of course, this does not apply to a "server channel socket" (such as HTTPChannel). A server channel socket is returned by accept and then connected; once it gets disconnected, it will never again become connected. Therefore, a "server channel socket" (e.g. "HTTPChannel") should immediately close in its constructor if the inherited constructor has set connected to False.

so HTTPChannel constructor should be

        wasyncore.dispatcher.__init__(self, sock, map=map)
        if not self.connected:
            self.handle_close()

djay commented 12 months ago

@d-maurer @digitalresistor ok. I've finally managed to reproduce this bug with no sleep needed. What is required is

client.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))

which creates a forced close from the client side. then getpeername fails right away and the loop happens.

djay commented 12 months ago

I spoke too soon. That test results in handle_close being called because it does a handle_read.... so thats not looping.

d-maurer commented 12 months ago

Dylan Jay wrote at 2023-9-16 22:23 -0700:

... so HTTPChannel constructor should be

       wasyncore.dispatcher.__init__(self, sock, map=map)
       if not self.connected:
           self.handle_close()

Yes.

At least with the current if not self.connected: return in HTTPChannel's handle_write. But, I think this code is not optimal: the comment says it should prevent a double closure; but "not connected" does not mean "closed". If double closure is to be prevented, we should have a closed flag set by handle_close and not abuse connected.

Pylons / waitress

100% cpu in mainthread due to not closing properly? (channel.connected == False) #418