MagicStack / uvloop

Ultra fast asyncio event loop.
Apache License 2.0
10.44k stars 548 forks source link

Information exposure bug: response leaking to wrong requests in `uvicorn`-based server under heavy load #645

Open MarkusSintonen opened 6 days ago

MarkusSintonen commented 6 days ago

We encountered a very nasty data leakage bug with uvloop. It is leaking responses to incorrect HTTP requests in a uvicorn based server under a heavy load. This caused some users data to leak into requests of other users leading to a incorrect information exposure.

This issue is caused solely by uvloop as removal of it fixed the issue. Relying on vanilla asyncio does not have the same issue.

The issue happened under high load situations. The service processes some 800 million requests per day but 100 requests in a day had the wrong responses from other concurrently happening requests. It seemed to happen in a situations when there is a higher load. Also the issue sometimes correlated with other issues we experienced with uvloop:

Sometimes we observed these strange and bad looking RuntimeErrors coming from depths of uvloop at about same time as we saw the incorrect responses coming from requests. But this did not happen always in correlation. (The above errors also got fixed by removal of uvloop).

I haven't been able to reproduce this as it seems to only happen under heavy load situations and rarely enough (but still bad to leak information).

It seems uvloop might have some major issues in its TCP/socket/stream implementation that it tries to some times use incorrect already used socket like the RuntimeErrors would hint. Not sure is it actually related to the data leakage issue.

Kludex commented 6 days ago

Is this duplicated of https://github.com/MagicStack/uvloop/issues/631 ?

MarkusSintonen commented 6 days ago

Is this duplicated of #631 ?

I dont know, that seems to be about handling multiple requests for the same path the response took longer than expected to arrive at the client.. So something about performance. But it could be related.

MarkusSintonen commented 6 days ago

I hope eg uvicorn would drop usage of uvloop when doing pip install 'uvicorn[standard]'. It seems uvloop has quite a big amount issues currently in its connection handling implementation (the mentioned RuntimeError bugs etc). Also the perf claims it has in README are questionable (https://github.com/MagicStack/uvloop/issues/566).

todddialpad commented 2 days ago

I have been experiencing failures when under heavy load. This started with the aiohttp/aiohappyeyeballs changes in that projects 3.10.0 release. After some deep-diving I think the key change was that project's switch from allowing loop.create_connection to create and connect the underlying socket/file descriptor, to manually creating and connecting the socket and then passing the sock argument to loop.create_connection.

I'm guessing there is a problem with socket ownership, especially when an exception happens during the loop.create_connection call. I'm not completely sure of the exact code path, but to test the idea, I've tried the following patch

diff --git a/uvloop/loop.pyx b/uvloop/loop.pyx
index f9a5a23..9797684 100644
--- a/uvloop/loop.pyx
+++ b/uvloop/loop.pyx
@@ -1877,6 +1877,7 @@ cdef class Loop:
             AddrInfo ai_local = None
             AddrInfo ai_remote
             TCPTransport tr
+            int sockfd

             system.addrinfo *rai = NULL
             system.addrinfo *lai = NULL
@@ -2060,8 +2061,9 @@ cdef class Loop:
             waiter = self._new_future()
             tr = TCPTransport.new(self, protocol, None, waiter, context)
             try:
+                sockfd = sock.detach()
                 # libuv will make socket non-blocking
-                tr._open(sock.fileno())
+                tr._open(sockfd)
                 tr._init_protocol()
                 await waiter
             except (KeyboardInterrupt, SystemExit):
@@ -2075,8 +2077,6 @@ cdef class Loop:
                 tr._close()
                 raise

-            tr._attach_fileobj(sock)
-
         if ssl:
             app_transport = protocol._get_app_transport(context)
             try:

and the failures have gone away.

1st1 commented 21 hours ago

Interesting. We'll get this PR merged likely in a week. Thanks for the thorough investigation and good issue full of details.