encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
13.07k stars 833 forks source link

"Server disconnected" on HTTP/2 connection pool with big `keepalive_expiry`. #2983

Open jonathanslenders opened 9 months ago

jonathanslenders commented 9 months ago

When httpx is used to connect to an http/2 server, and the server disconnects, then httpx will attempt to reuse this connection for the following request. It happens when keepalive_expiry in the httpx limits is bigger than the keep_alive_timeout of the HTTP/2 web server.

To reproduce, we can run a Hypercorn http/2 server using a dummy Starlette application like this (TLS is required because of http/2):

import asyncio
from hypercorn.asyncio import serve
from hypercorn.config import Config
from starlette.applications import Starlette

app = Starlette(routes=[])

config = Config.from_mapping({})
config.bind = ["127.0.0.1:8001"]
config.certfile = ".../webserver.pem"
config.keyfile = ".../webserver.key"
config.keyfile_password = "...."
config.keep_alive_timeout = 1  # Second (is 5 by default).

asyncio.run(serve(app, config))

Then run this client code:

import asyncio
import httpx

async def main() -> None:
      async with httpx.AsyncClient(
          verify=False,
          http2=True,
          limits=httpx.Limits(
              keepalive_expiry=100,
          ),
      ) as client:
          while True:
              await client.get("https://localhost:8001/")
              await asyncio.sleep(6)  # Should be bigger than the `keep_alive_timeout` Hypercorn config.

asyncio.run(main())

Important is to ensure http2=True is set and keepalive_expiry is None or a value bigger than the corresponding option of the Hypercorn server.

For the second request, httpx tries to reuse the connection from the connection pool although it has been closed by the other side in the meantime and fails, immediately producing the following Server disconnected error:

 Traceback (most recent call last):
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
      yield
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
      resp = await self._pool.handle_async_request(req)
    File ".../lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
      raise exc
    File ".../lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
      response = await connection.handle_async_request(request)
    File ".../lib/python3.9/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
      return await self._connection.handle_async_request(request)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 185, in handle_async_request
      raise exc
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 148, in handle_async_request
      status, headers = await self._receive_response(
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 292, in _receive_response
      event = await self._receive_stream_event(request, stream_id)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 333, in _receive_stream_event
      await self._receive_events(request, stream_id)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 361, in _receive_events
      events = await self._read_incoming_data(request)
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 452, in _read_incoming_data
      raise exc
    File ".../lib/python3.9/site-packages/httpcore/_async/http2.py", line 440, in _read_incoming_data
      raise RemoteProtocolError("Server disconnected")
  httpcore.RemoteProtocolError: Server disconnected

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    ...
    File ".../our-code.py", line 191, in make_stream_request
      async with client.stream(
    File ".../lib/python3.9/contextlib.py", line 181, in __aenter__
      return await self.gen.__anext__()
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1573, in stream
      response = await self.send(
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1617, in send
      response = await self._send_handling_auth(
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
      response = await self._send_handling_redirects(
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
      response = await self._send_single_request(request)
    File ".../lib/python3.9/site-packages/httpx/_client.py", line 1719, in _send_single_request
      response = await transport.handle_async_request(request)
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
      resp = await self._pool.handle_async_request(req)
    File ".../lib/python3.9/contextlib.py", line 137, in __exit__
      self.gen.throw(typ, value, traceback)
    File ".../lib/python3.9/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
      raise mapped_exc(message) from exc
  httpx.RemoteProtocolError: Server disconnected

I started by commenting in this discussion: https://github.com/encode/httpx/discussions/2056 , but turned it in an issue after I had a clear reproducible case. Feel free to close if it's not appropriate.

tomchristie commented 9 months ago

Thanks @jonathanslenders.

First places I'd start narrowing this down are...

Different HTTP versions.

Different clients.

Different servers.

moshego189 commented 9 months ago

Hi @jonathanslenders it seems to me that the problem is hypercorn specific it just happened to me on a similar setup, i think hypercorn is closing the connection in some unexpected way when using http2

jonathanslenders commented 9 months ago

Thank you @tomchristie for looking into it!

I've been diving deeper. Maybe I have a fix (see below).

I think the problem is due to the complexity of terminating TLS connections. It's hard to half-close a TLS connection, and by doing so, the underlying TCP stream is not necessarily closed, which makes it very hard for the other end to observe a half closed TLS connection.

From httpx's point of view, when performing the second request, the HTTPConnectionState is IDLE, and _network_stream.get_extra_info('is_readable') is True. The TCP connection is still full open in both directions, and the data that is readable at this point could very well be TLS unwrap data.

Our is_socket_readable check works fine for non-TLS connections, but for a TLS connection, we have to decrypt any pending traffic to see whether the TLS stream was not terminated. Basically, that means passing any buffered data through the ssl.MemoryBio and see whether an EOF/ssl.SSLZeroReturnError comes out at the other end, indicating that the other party unwrapped the connection.

From Hypercorn's point of view, I can see that Hypercorn does call asyncio.StreamWriter.close() after the idle timeout. However, the transport is of type asyncio.sslproto._SSLProtocolTransport, which means there's no guarantee that the actual socket is closed. Looking at netstat, the TCP socket remains ESTABLISHED even after Hypercorn tries to close the writer. (@pgjones, you might want to look into this.)

Diving into httpcore, and looking at httpcore/_async/http2, in handle_async_request, I don't see is_socket_readable being called at any point in time. Where do we check when the connection is still alive for http/2+TLS connections? I don't think we do that.

The following httpcore patch does detect whether the connection is alive. I don't know which exception to raise, but this could be a starting point:

diff --git a/httpcore/_async/http2.py b/httpcore/_async/http2.py
index 8dc776f..e08e018 100644
--- a/httpcore/_async/http2.py
+++ b/httpcore/_async/http2.py
@@ -136,6 +136,10 @@ class AsyncHTTP2Connection(AsyncConnectionInterface):
             self._request_count -= 1
             raise ConnectionNotAvailable()

+        alive = await self._network_stream.is_alive()
+        if not alive:
+            raise Exception('Not alive!')
+
         try:
             kwargs = {"request": request, "stream_id": stream_id}
             async with Trace("send_request_headers", logger, request, kwargs):
diff --git a/httpcore/_backends/anyio.py b/httpcore/_backends/anyio.py
index 1ed5228..d18217f 100644
--- a/httpcore/_backends/anyio.py
+++ b/httpcore/_backends/anyio.py
@@ -1,5 +1,6 @@
 import ssl
 import typing
+from collections import deque

 import anyio

@@ -19,10 +20,13 @@ from .base import SOCKET_OPTION, AsyncNetworkBackend, AsyncNetworkStream
 class AnyIOStream(AsyncNetworkStream):
     def __init__(self, stream: anyio.abc.ByteStream) -> None:
         self._stream = stream
+        self._buffer = deque()

     async def read(
         self, max_bytes: int, timeout: typing.Optional[float] = None
     ) -> bytes:
+        if len(self._buffer) > 0:
+            return self._buffer.popleft()
         exc_map = {
             TimeoutError: ReadTimeout,
             anyio.BrokenResourceError: ReadError,
@@ -92,6 +96,25 @@ class AnyIOStream(AsyncNetworkStream):
             return is_socket_readable(sock)
         return None

+    async def is_alive(self) -> bool:
+        """
+        Test whether the connection is still alive. If this is a TLS stream,
+        then that is different from testing whether the socket is still alive.
+        It's possible that the other end did a half-close of the TLS stream,
+        but not a half-close of the underlying TCP stream.
+
+        We test whether it's alive by trying to read data. When we get b"", we
+        know EOF was reached.
+        """
+        try:
+            data = await self.read(max_bytes=1, timeout=0.01)
+        except ReadTimeout:
+            return True
+        if data == b'':
+            return False
+        self._buffer.append(data)
+        return True
+

 class AnyIOBackend(AsyncNetworkBackend):
     async def connect_tcp(
zanieb commented 9 months ago

Looks similar to the previously resolved https://github.com/encode/httpcore/pull/679

Thanks for all the detailed commentary!