Broken connections with many concurrent `HEAD` requests

zanieb commented 8 months ago

Using waitress==2.1.2 and pyramid==2.0.2 I've discovered that ~1% of requests fail due to broken connections when there are many concurrent requests.

Requests fail with a BrokenResourceError or ReadError or RemoteProtocolError: Server disconnected without sending a response. or Connection reset by peer etc.

The server looks like:

from pyramid.config import Configurator
from pyramid.response import Response

import waitress

import logging
logger = logging.getLogger('waitress')
logger.setLevel(logging.DEBUG)

def hello_world(request):
    return Response("test")

if __name__ == '__main__':

    with Configurator() as config:
        config.add_route('hello', '/')
        config.add_view(hello_world, route_name='hello')
        app = config.make_wsgi_app()

    waitress.serve(app, host='127.0.0.1', port=8000, threads=10)

If I increase the threads, nothing changes. If I use another WSGI server like gunicorn, there are no failed requests.

I'm testing concurrent request success rates with following script:

import httpx
import anyio
import traceback

ATTEMPTS = 1000
TARGET = "http://localhost:8000"
HEAD_FAILURE = []
HEAD_SUCCESS = 0
GET_FAILURE = []
GET_SUCCESS = 0

async def head(client: httpx.AsyncClient, url: str):
    global HEAD_SUCCESS
    try:
        response = await client.head(url)
        response.raise_for_status()
    except Exception as exc:
        HEAD_FAILURE.append(exc)
    else:
        HEAD_SUCCESS += 1

    print(".", end="")

async def get(client: httpx.AsyncClient, url: str):
    global GET_SUCCESS
    try:
        response = await client.get(url)
        response.raise_for_status()
    except Exception as exc:
        GET_FAILURE.append(exc)
    else:
        GET_SUCCESS += 1

    print(".", end="")

async def main():
    async with httpx.AsyncClient(timeout=httpx.Timeout(timeout=300)) as client:
        async with anyio.create_task_group() as tg:
            for _ in range(ATTEMPTS):
                tg.start_soon(head, client, TARGET)
                tg.start_soon(get, client, TARGET)

anyio.run(main)

# Report

seen = set()
for exc in HEAD_FAILURE:
    if type(exc) in seen:
        continue
    seen.add(type(exc))
    print()
    print()
    traceback.print_exception(exc)
    print()

print()
print(f"Displayed {len(seen)} unique exception types")
print(f"{len(HEAD_FAILURE)}/{len(HEAD_FAILURE) + HEAD_SUCCESS} HEAD requests failed")
print(f"{len(GET_FAILURE)}/{len(GET_FAILURE) + GET_SUCCESS} GET requests failed")

I'm testing on macOS with Python 3.11.4.

I discovered this while working with devpi, see https://github.com/devpi/devpi/issues/1022.

zanieb commented 8 months ago

With a more sophisticated testing approach that doesn't interleave HEAD requests with others, you can see that HEAD is the cause of this

import httpx
import anyio
from collections import defaultdict

ATTEMPTS = 2000
TARGET = "http://localhost:8000"
METHODS = ["HEAD", "GET", "POST"]
RESULTS = defaultdict(list)

async def query(client: httpx.AsyncClient, method: str, url: str):
    try:
        response = await getattr(client, method.lower())(url)
        response.raise_for_status()
    except Exception as exc:
        RESULTS[method].append(exc)

    print(".", end="")

async def main():
    for method in METHODS:
        async with httpx.AsyncClient(timeout=httpx.Timeout(timeout=300)) as client:
            async with anyio.create_task_group() as tg:
                for _ in range(ATTEMPTS):
                    tg.start_soon(query, client, method, TARGET)

anyio.run(main)

# Report
print()
for method in METHODS:
    failures = RESULTS[method]

    seen = set()
    errors = []
    for exc in failures:
        if type(exc) in seen:
            continue
        seen.add(type(exc))
        errors.append(exc)

    print(f"{len(failures)}/{ATTEMPTS} {method} requests failed with {len(seen)} unique exception types")
    for exc in errors:
        print(f"  - {type(exc).__name__} {exc}")

255/2000 HEAD requests failed with 2 unique exception types
  - ReadError 
  - RemoteProtocolError Server disconnected without sending a response.
0/2000 GET requests failed with 0 unique exception types
0/2000 POST requests failed with 0 unique exception types

digitalresistor commented 7 months ago

I've merged one of the two tickets related to this. The other one needs some smaller minor changes, working on those.

zanieb commented 6 months ago

Thank you!

Pylons / waitress

Broken connections with many concurrent `HEAD` requests #427