Closed djay closed 4 months ago
Dylan Jay wrote at 2023-9-16 23:06 -0700:
@d-maurer @digitalresistor ok. I've finally managed to reproduce this bug with no sleep needed. What is required is
client.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))
which creates a forced close from the client side. then getpeername fails right away and the loop happens.
Congratulations!
But we still need to understand why readable
returns False
(otherwise, the closed connection should result in a 0 length read
and the connection should get closed).
Maybe, we have the following scenario:
getpeername
already fails; connected
is set to
False
readable
return False
).handle_write
is effectively a no-op.@d-maurer I was incorrect. I think it's not possible to simulate a broken connection where nothing is sent in python. NO_LINGER sends a RST which is why waitress closes the connection.
Maybe it can be done somehow with scapy - https://stackoverflow.com/questions/67515275/http-request-with-scapy. Not sure. but not sure what is being sent to close the connection that doesn't result in a recv and then a close.
Dieter Maurer wrote at 2023-9-17 08:39 +0200:
... But we still need to understand why
readable
returnsFalse
(otherwise, the closed connection should result in a 0 length read and the connection should get closed).Maybe, we have the following scenario:
- client opens the connection, sends a request and immediately closes the connection.
- server accepts but
getpeername
already fails;connected
is set toFalse
- server reads the request, processes it and this produces output (which lets
readable
returnFalse
).- We are now in the state where "readable" events are ignored and the
handle_write
is effectively a no-op.
I could get this scenario with the following code for 2 interactive sessions:
client:
from socket import socket, AF_INET, SOCK_STREAM, SHUT_RDWR, SOL_SOCKET, SO_LINGER
from struct import pack
from time import sleep
cs = socket(AF_INET, SOCK_STREAM)
cs.setsockopt(SOL_SOCKET, SO_LINGER, pack('ii', 1, 0))
cs.connect(("localhost", 10000))
cs.send(b"1")
sleep(1)
cs.shutdown(SHUT_RDWR)
cs.close()
server:
from socket import *
ss = socket(AF_INET, SOCK_STREAM)
ss.bind(("localhost", 10000))
ss.listen()
cs, addr = ss.accept()
# allow the client code to run
cs.getpeername()
cs.recv(1)
cs.recv(1)
The getpeername
fails, the first recv
returns b"1"
, following
recv
return empty data.
Thus, if the first recv
receives a request, this one is
processed sufficiently fast such that it can produced output before the next
poll
round, we enter the observed busy loop state.
@d-maurer can you get this working with the actual test case? Because I did try doing a send. It didn't make a difference because no matter if anything is recv a close no linger sends a RST which gets recv and then server removes the channel cleanly
Dylan Jay wrote at 2023-9-17 07:55 -0700:
@d-maurer can you get this working with the actual test case?
I did not try.
Because I did try doing a send. It didn't make a difference because no matter if anything is recv a close no linger sends a RST which gets recv and then server removes the channel cleanly
The sleep
in my client code is there to give the communication
channel enough time to transfer the message to the server
before the shutdown/close
. The documentation says that pending
messages are discarded. In my case, the message already in the
server was not discarded but delivered. A sleep
(or other
delay) might be necessary.
The documentation hints towards significant differences (wrt. "linger 0") for sockets in blocking and non-blocking mode, respectively. In my case, all sockets are blocking, in your test only the client socket.
Dylan Jay wrote at 2023-9-17 07:55 -0700:
@d-maurer can you get this working with the actual test case? Because I did try doing a send. It didn't make a difference because no matter if anything is recv a close no linger sends a RST which gets recv and then server removes the channel cleanly
In your test case, it will not be enough to send b"1"
.
The HTTPChannel
will try further reads until a complete HTTP request
was read in; only then request processing can start and
produce output (which lets readable()
return False
, blocking further
recv
and thereby the determination that the remote end was closed).
In your test case, it will not be enough to send
b"1"
. TheHTTPChannel
will try further reads until a complete HTTP request was read in; only then request processing can start and produce output (which letsreadable()
returnFalse
, blocking furtherrecv
and thereby the determination that the remote end was closed).
@d-maurer I see what you're saying now. The only way for readable to be False while writable is True is for total_outbufs_len > 0
which means the app has data to write back.
But I haven't been able to get getpeername to fail with shutdown alone and close seems to make recv close on the server side.
Dylan Jay wrote at 2023-9-17 11:29 -0700:
... But I haven't been able to get getpeername to fail with shutdown alone and close seems to make recv close on the server side.
recv
only closes the server side if it returns empty data.
In my example code, the first recv
returns non empty data
(because the shudown/close
happens after the data has reached
the server and has been acknowledged).
@d-maurer
Adding
def received(self, data):
res = HTTPChannel.received(self, data)
if data:
# Fake app returning data fast
self.total_outbufs_len = 1
# self.request.completed = True
# self.requests.append(DummyParser())
pass
return res
reproduces the loop but now I'm trying to see if its possible a request to return a result and be written when not connected.
@d-maurer and I can't see how its possible to send anything due to this line. so total_outbufs_len can't be more than 0
def write_soon(self, data):
if not self.connected:
# if the socket is closed then interrupt the task so that it
# can cleanup possibly before the app_iter is exhausted
raise ClientDisconnected
@d-maurer maybe one scenario is that close_on_finish = True due to an exception but when it tries to send the socket buffer is full for some reason. Then we get the EWOULDBLOCK below and will_close never gets set and the no close happens.
def send(self, data, do_close=True):
try:
result = self.socket.send(data)
return result
except OSError as why:
if why.args[0] == EWOULDBLOCK:
return 0
elif why.args[0] in _DISCONNECTED:
if do_close:
self.handle_close()
return 0
else:
raise
@d-maurer close_on_finish = True isn't possible but there is one way data gets written even when disconnected. Thats if the request asks for a continue to be sent. but I've no idea how to get the EWOULDBLOCK when sending a continue.
@d-maurer ok, I can get it to send unlimited continues, but the problem now is that send gets a disconnected error before it hits EWOULDBLOCK and we never get total_outbufs_len creating the loop
Dylan Jay wrote at 2023-9-17 20:50 -0700:
@d-maurer maybe one scenario is that close_on_finish = True due to an exception but when it tries to send the socket buffer is full for some reason. Then we get the EWOULDBLOCK below and will_close never gets set and the no close happens.
When the socket buffer is full, select
should not signal "ready to write".
But that is what your strace
has shown:
select
signals "ready to write"def send(self, data, do_close=True): try: result = self.socket.send(data) return result except OSError as why: if why.args[0] == EWOULDBLOCK: return 0 elif why.args[0] in _DISCONNECTED: if do_close: self.handle_close() return 0 else: raise
It is possible that the send
above is called without strace
seeing
an output operation if an error is detected at the Python
or (C library) level (preventing the output system call).
In principle, "disconnected" could be such a condition;
but in this case, handle_close
should be called (I have not yet
seen a call with not do_close
).
It is highly unlikely that Python/the C library could know
a blocking result without asking the OS.
Dylan Jay wrote at 2023-9-17 22:40 -0700:
@d-maurer ok, I can get it to send unlimited continues, but the problem now is that send gets a disconnected error before it hits EWOULDBLOCK and we never get total_outbufs_len creating the loop
The Python socket tutorial
(--> "https://docs.python.org/3/howto/sockets.html#socket-howto")
indicates (in the MySocket.mysend
example of section
"Using a socket") that "send" does not indicate a disconnected socket
via an exception but by send
returning 0
.
The HowTo might be wrong; however the indicated behavior of send
would be symmetric to that of recv
.
The send
code you have shown does not look for a 0
return value.
Dieter Maurer wrote at 2023-9-18 08:54 +0200:
Dylan Jay wrote at 2023-9-17 22:40 -0700:
@d-maurer ok, I can get it to send unlimited continues, but the problem now is that send gets a disconnected error before it hits EWOULDBLOCK and we never get total_outbufs_len creating the loop
The Python socket tutorial (--> "https://docs.python.org/3/howto/sockets.html#socket-howto") indicates (in the
MySocket.mysend
example of section "Using a socket") that "send" does not indicate a disconnected socket via an exception but bysend
returning0
.The HowTo might be wrong; however the indicated behavior of
send
would be symmetric to that ofrecv
.
The tutorial is wrong:
on my Linux (5.4) system,
send
on a connection closed by the remote end
does not return 0
but raises OSError
with errno 32 (--> EPIPE
).
Thus, waitress
should correctly recognize the disconnection
(and call handle_close
unless not doclose
).
@d-maurer
When the socket buffer is full,
select
should not signal "ready to write".But that is what your
strace
has shown:
- the connection is not interested in read (i.e. "not readable()")
- the connection is interested in write (i.e. "writable()")
select
signals "ready to write"- no output is tried
to be both not readable and writable it is possible if there is a total_outbufs_len > 0, ie that there is more data left to write than was sent. The only way that can happen is one big read of a series of headers pipelined, ie for seperate requests and each is "empty". That sets up a situation where the self.send_continue() keeps happening over and over and that seems to be the only way you can send when not self.connected.
def received(self, data):
"""
Receives input asynchronously and assigns one or more requests to the
channel.
"""
if not data:
return False
with self.requests_lock:
while data:
if self.request is None:
self.request = self.parser_class(self.adj)
n = self.request.received(data)
# if there are requests queued, we can not send the continue
# header yet since the responses need to be kept in order
if (
self.request.expect_continue
and self.request.headers_finished
and not self.requests
and not self.sent_continue
):
self.send_continue()
if self.request.completed:
# The request (with the body) is ready to use.
self.sent_continue = False
if not self.request.empty:
self.requests.append(self.request)
if len(self.requests) == 1:
# self.requests was empty before so the main thread
# is in charge of starting the task. Otherwise,
# service() will add a new task after each request
# has been processed
self.server.add_task(self)
self.request = None
if n >= len(data):
break
data = data[n:]
return True
So if those sends don't fail with an OSError but instead only partially send or get EWOULDBLOCK then some data is left to send and assuming the requests finished at that point then then it could be stuck in the loop. If as you say, select won't select a write if the socket send buffer is full then this all falls apart.
send_continue() is the only way I can see of total_outbufs_len being > 0.
Maybe there is some way to achieve this loop by making close_when_flushed == True...
Dylan Jay wrote at 2023-9-17 22:40 -0700:
@d-maurer ok, I can get it to send unlimited continues, but the problem now is that send gets a disconnected error before it hits EWOULDBLOCK and we never get total_outbufs_len creating the loop
Apparently, we need more information.
Maybe, you are able to implement something like the following idea:
You change the handle_write
code to check a condition
"extended diagnostic required".
If this condition is fulfilled, it logs all relevant information about the
state of the connection to be written (this includes fileno()
, connected
,
total_outbuf_len
, will_close
, close_when_flushed
, ...).
You use this in your production environment and set the condition
once you see the 100 % CPU usage.
I see at least 2 options to implement the condition:
via a global variable, set/reset via a view
handle_write
could monitor the variable and log if true
via a file
handle_write
would check for the existence of the file
and log in this case.
The diagnostics will reveal the state of the connection responsible for the busy loop. Hopefully, this will help us to understand how it could have been entered.
Dylan Jay wrote at 2023-9-18 01:21 -0700:
@d-maurer ... If as you say, select won't select a write if the socket send buffer is full then this all falls apart.
Usually, select
will not report "ready to write" if the send buffer
is full.
There are exceptions, however:
if select
learns that the remote end has been shut down for reads,
it may signal "ready to write" to allow the application to
learn about this condition (via a failing send
).
The select
promise is: it announces a ready event
if the corresponding IO operation would not block.
Thus, this IO operation may either send or recieve (respectively)
some data or fail with an error condition (different from EWOULDBLOCK
).
@d-maurer I think I can reproduce is reasonably simply with close_when_flushed.
The request just need to have an error in it. Then when service is called
def service(self):
"""Execute one request. If there are more, we add another task to the
server at the end."""
request = self.requests[0]
if request.error:
task = self.error_task_class(self, request)
else:
task = self.task_class(self, request)
try:
if self.connected:
task.service()
else:
task.close_on_finish = True
except ClientDisconnected:
...
if task.close_on_finish:
with self.requests_lock:
self.close_when_flushed = True
for request in self.requests:
request.close()
self.requests = []
We get an new error task that has close_on_finish. That then seems to create a loop by setting close_when_flushed since it never gets to write that flush due to not being in connected state.
I've updated my test to show this working
Not yet sure if I'm calling service at a realistic time or not.
Dylan Jay wrote at 2023-9-18 03:31 -0700:
... I've updated my test to show this working
Not yet sure if I'm calling service at a realistic time or not.
At least, you have shown:
close_when_flushed and not conneceted
would produce
output like that shown by your strace
not connected
state
to service a task sufficiently fast
that it sets channel.close_when_flushed
before
the next poll
, then the state above is entered.The channel.service
is likely executed in a "worker thread".
All we need to achieve 2. is that the thread switch from
the main thread is sufficiently fast and there is no switch
back until channel.close_when_flushed
is set.
@d-maurer
- if it is possible in the
not connected
state to service a task sufficiently fast that it setschannel.close_when_flushed
before the nextpoll
, then the state above is entered.
Speed isn't needed. It will sit there waiting since the request is finished so it's got nothing to write or read. At least if self.adj.channel_request_lookahead == 0. Which I guess that setting is designed to prevent this case so I will try setting it.
No, channel_request_lookahead
is used to allow waitress to try and read more than the first request (normally it would read a request, wait for a response to be created/processed, only then read the next request, so that it wouldn't read a bunch of data/HTTP pipelining requests from the wire without being able to respond for example if the connection was goin to be closed early)... this however meant there was no good way to let the app know that the client disconnected for long running requests that wanted to check before they committed data to the database for example.
That is why waitress has a waitress.client_disconnected
callable in the environment that allows the app to check if the remote client has disconnected, but that only works if we continue to read() from the socket and thus get notified when the remote hangs up.
@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production.
But this is still a bug. I've put in two fixes.
def handle_write(self):
# Precondition: there's data in the out buffer to be sent, or
# there's a pending will_close request
if not self.connected and not self.close_when_flushed:
# we dont want to close the channel twice
return
def __init__(self, server, sock, addr, adj, map=None):
self.server = server
self.adj = adj
self.outbufs = [OverflowableBuffer(adj.outbuf_overflow)]
self.creation_time = self.last_activity = time.time()
self.sendbuf_len = sock.getsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF)
# requests_lock used to push/pop requests and modify the request that is
# currently being created
self.requests_lock = threading.Lock()
# outbuf_lock used to access any outbuf (expected to use an RLock)
self.outbuf_lock = threading.Condition()
wasyncore.dispatcher.__init__(self, sock, map=map)
if not self.connected:
# Sometimes can be closed quickly and getpeername fails.
self.handle_close()
# Don't let wasyncore.dispatcher throttle self.addr on us.
self.addr = addr
self.requests = []
@d-maurer I've also worked out why maintenance never close the connection and put in a fix for that too. Unfortunately there doesn't seem to be a test for this close called twice case so still not clear why that line that caused the loop is there in the first place and if isn't better just to get rid of it.
Well this is at least the commit that put this in https://github.com/Pylons/waitress/commit/9a7d2e5ff9405f8ae8234449f86ba888b7e142e4
Dylan Jay wrote at 2023-9-18 08:49 -0700:
@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production.
But this is still a bug. I've put in two fixes.
def handle_write(self): # Precondition: there's data in the out buffer to be sent, or # there's a pending will_close request if not self.connected and not self.close_when_flushed: # we dont want to close the channel twice return
Maybe, this change is not yet optimal:
The comment indicates that the purpose is to avoid to close the channel
twice.
It likely addresses the following scenario:
the previous handle_read
has detected a remote closure and
closed the channel; the following handle_write
may try to
close the channel again which should be prevented.
My proposal: do not abuse connected
for this logic.
Instead introduce an additional attribute closed
, initially False
and set to True
in _dispatcher.close
.
This way, closed
is True
if and only if close
has already been called.
The code above could become if self.closed: return
.
Dylan Jay wrote at 2023-9-18 08:49 -0700:
@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production.
That's where my speed argument may come in:
when the state change from channel.service()
(which likely requires
a thread switch) happens after the next poll, channel.readable()
still returns True
in the poll
and the channel has the chance to learn that
the remote end was closed (closing the channel);;
only when it happens before the next poll, it prevents readable
from returning True
and the remote end closing remains unnoticed.
Dylan Jay wrote at 2023-9-19 21:16 -0700:
@d-maurer I've also worked out why maintenance never close the connection and put in a fix for that too. Unfortunately there doesn't seem to be a test for this close called twice case so still not clear why that line that caused the loop is there in the first place and if isn't better just to get rid of it.
Scenario:
The channel is both readable
and writable
(not sure whether this is possible);
When in this state, recv
returns empty data, it will close the channel;
a following handle_write
(in the same poll
call)
will try to close a second time (because the send
will (likely)
fail with an exception).
The current PR has been running in production of a couple of weeks without the bug reoccuring so that at least confirms the source of the issue. But I will still rewrite the PR with changes mentioned above
@djay I have not been able to reproduce the original error with the current released version of waitress. It is somewhat frustrating that I can't reproduce it.
@digitalresistor I merged in master, reversed my fixes and uncommented the test case code that reproduces it and it showed that it went into a loop still. How did you try and reproduce it?
@djay I have not been using the test code you created. I wanted to reproduce it without trying to set up the conditions to match exactly, or forcing a socket into a particular state. I have been testing with waitress running on a RHEL 8.9 system, I have been unable to get getpeername()
to fail in those conditions even while trying to do the half-shutdown and stress testing waitress at the same time. I have not manually manipulated SO_LINGER to setup the conditions as specified, just using the defaults from the OS.
I also would expect to see far more instances of this error happening out in the wild, waitress is used extensively, and I have yet to see it happen on any of my applications that are running in production.
I don't disagree that there is an issue and we can solve it, but I find it interesting that it is not easily reproducible on test systems I have.
The current PR does the closing much later, I'd much rather see it get handled when getpeername()
fails before we even attempt to do anything else with the socket, or start setting up a channel, if we can avoid doing extra work, let's avoid doing extra work.
Then we can rework the logic for writeable as well to make sure that we can't busy loop there.
Delta Regeer wrote at 2024-2-4 21:02 -0800:
@djay I have not been using the test code you created. I wanted to reproduce it without trying to set up the conditions to match exactly, or forcing a socket into a particular state. I have been testing with waitress running on a RHEL 8.9 system, I have been unable to get
getpeername()
to fail in those conditions even while trying to do the half-shutdown and stress testing waitress at the same time. I have not manually manipulated SO_LINGER to setup the conditions as specified, just using the defaults from the OS.
"Normal" browsers (and other HTTP agents) likely use SO_LINGER
rarely.
I assume that this is the reason why we see only few cases.
@digitalresistor The circumstances are very particular. It was something we were getting from pentests only I believe. An invalid request is sent and immediately closed by the client before the channel could be created. it would not be easy to reproduce as you would need the invalid request, and a loaded enough server that it not be so fast to create the channel.
The current PR does the closing much later, I'd much rather see it get handled when getpeername() fails before we even attempt to do anything else with the socket, or start setting up a channel, if we can avoid doing extra work, let's avoid doing extra work.
This part ensures its closed if getpeername fails
And I added some other tests to show you how the other fix prevents any looping if for any other reasons channel.connected would become false and we have an error in the request. Even though that should not happen.
@djay I will take another look at this over the next week or so. I appreciate the follow-up!
@digitalresistor I wouldn't think about it too long. basically I've given a recipe to DDOS any waitress site. lock up one thread with a single request. This would happen every sunday with this repeated pentest. Once I deployed the fix it never happened again. Leaving a bug like this is in doesn't seem right to me.
I've created a new PR that removes getpeername()
entirely. It is not useful in the context of waitress, and cleaned up some code related to self.connected
, since calling self.handle_close()
multiple times is not an issue.
Could you drop this into your system instead and take a look if you still see the same issue: https://github.com/Pylons/waitress/pull/435
@digitalresistor I can. But are you sure that there is no other scenario where self.connected is False so it goes into a loop and it can't ever get closed down? ie this line is still problematic I think. https://github.com/Pylons/waitress/pull/419/files#diff-5cb215c6142fa7daad673c77fcb7f3bc0a0630e18e3487a5ac0894e90f1b2071L95
@digitalresistor sorry I see you remove this also. Ok, I will test this
@djay do you have any feedback with the newly proposed changes?
@digitalresistor sorry for the delay on this. It was deployed earlier this week but the weekly pentest only happens once a week on a sunday so monday we should know.
@digitalresistor seems to be running fine in prod with what we presume are the same pentests being thrown at it
Following on from debugging in this issue - https://github.com/collective/haufe.requestmonitoring/issues/15
What we see is waitress switching into 100% CPU and staying there. It is happening in production randomly (within a week) and we haven't traced it back to a certain request).
Using a sampling profiler on waitress with 2 threads (in prod) we identified the thread using the CPU as the mainthread (top -H) and this is the profile. Note that since this is prod there are other requests so not all activity is related to the looping causing this bug.
from profiling it looks like channel is writable but the channel.connected == False. So then it goes into a loop without writing or closing since it never actually does anything to the socket. https://github.com/Pylons/waitress/blob/main/src/waitress/channel.py#L98
EDIT: My suspicion would be that what started this was a client that shutdown (half) very quickly after a connect and this happened before the dispatcher finished being setup. This causes getpeername to fail with EINVAL and connected = False.
https://github.com/Pylons/waitress/blob/4f6789b035610e0552738cdc4b35ca809a592d48/src/waitress/wasyncore.py#L310
Could be same issue as https://github.com/Pylons/waitress/issues/411 but hard to tell.
One fix in #419 but could be better ways?