django / asgiref

ASGI specification and utilities
https://asgi.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.46k stars 207 forks source link

Communicating client disconnects without receiving the request body #419

Closed adriangb closed 10 months ago

adriangb commented 10 months ago

Currently, the only way an application can check if the client is disconnected is by calling receive(). This is problematic because it forces clients to keep pulling the request body into memory to check if the client disconnected, even if the application ultimately doesn't want to read the request body. Can we consider an alternative approach to communicate that the client disconnected?

One thing I've thought of is adding optional arguments to receive(): await receive(body=False, disconnect=True) or something like that. Unfortunately, I think it'd be a breaking change.

Does anyone have any ideas on how we can tackle this?

As a motivating use case, Starlette checks for client disconnects when streaming a response back. This makes sense because even if the server handles send() being called as a no-op after the client disconnects the application might be stuck in a really long loop doing work that gets thrown away for no reason.

Kludex commented 10 months ago

I think this is discussed on: https://github.com/django/asgiref/issues/66

I promise to create a PR to modify the spec to raise IOError on send().

adriangb commented 10 months ago

If you're sure it's a duplicate please close this now. I'll read through that in the next day or so.

adriangb commented 10 months ago

This does seem like a dupe, closing

Kludex commented 9 months ago

Adrian, I need to clarify something here, since I'm studying https://github.com/django/asgiref/issues/66.

This is problematic because it forces clients to keep pulling the request body into memory to check if the client disconnected [...]

There's a typo here, right? You meant ASGI application, and not clients.

If the typo is correct, then the above is actually not true. It really doesn't matter if the ASGI application calls receive(). The body is loaded in memory by the ASGI server anyway.

adriangb commented 9 months ago

Yes, it's a typo. I mean the ASGI application.

Do you mean that the server has to read the request body anyway to see if the client disconnects? I guess I don't know exactly what situations lead to the server detecting if the client is disconnected. If that's the case, then does that mean that unless either (1) the application calls receive() (what Starlette currently does in StreamingResponse) or (2) the server reads chunks and discards them (essentially doing what Starlette does but in the ASGI server) then there's no way to know the client disconnected?

Kludex commented 9 months ago

Do you mean that the server has to read the request body anyway to see if the client disconnects?

No.

This is problematic because it forces ASGI application to keep pulling the request body into memory to check if the client disconnected [...]

What I mean is that the ASGI application calling receive() doesn't force the server to load the request in body in memory. What receive() does is ask for the next ASGI event. The logic internally is: if the client is disconnect, send the disconnect event, else, send the body (which is already in memory).

adriangb commented 9 months ago

Let’s focus on the simple return StreamingResponse(request.stream()). Or any case using StreamingResponse where the client sends a request body but the application never intended to read it. Just the fact that returning a response involves reading the request body makes no sense.

I’m also not sure why the server would already have the entire body in memory if it’s a chunk-encoded request / the client is streaming it to the server.

Kludex commented 9 months ago

I’m also not sure why the server would already have the entire body in memory if it’s a chunk-encoded request / the client is streaming it to the server.

I'm saying that the ASGI application does not control when the server will load the body (either entirely or partially) in memory. If it's a chunk, the chunk will be already in memory when you call receive() from the application.

The way you are describing is: the application will call receive(), and then the server will load body in memory. This is not the case. When the application calls receive(), the chunk the application will receive will be the body the server was able to receive till that moment.

Maybe this example helps?

import asyncio
import time
from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/")
async def read_streaming_body(request: Request):
    # wait for the client send all the body
    await asyncio.sleep(2)
    async for chunk in request.stream():
        print(chunk)
    return {"ok": True}

if __name__ == "__main__":
    import httpx

    def upload_bytes():
        yield b"Hello, "
        time.sleep(1)
        yield b"world!"

    httpx.post("http://localhost:8002/", content=upload_bytes())

I didn't get exactly what's the goal of your first paragraph, but maybe the example above already helps on that?

adriangb commented 9 months ago

I understand the server will have a chunk in memory, but I thought it doesn’t load all of the chunks into memory unless you continuously call receive(), which is exactly what the application ends up doing: https://github.com/encode/starlette/blob/93494a46ee0f2e8b1eeafd12e466ebac15037b41/starlette/responses.py#L232

I’m surprised the server would buffer unlimited data from the client like your example appears to be implying.

Kludex commented 9 months ago

I’m surprised the server would buffer unlimited data from the client like your example appears to be implying.

On Uvicorn, it's not unlimited. There's a limit per connection (for security reasons).

I understand the server will have a chunk in memory, but I thought it doesn’t load all of the chunks into memory unless you continuously call receive(), which is exactly what the application ends up doing: encode/starlette@93494a4/starlette/responses.py#L232

Those lines are saying: "I'm only interested in the disconnect message. Ignore the other messages." That logic is not interested in throwing away the body - nor trying to get the body.

adriangb commented 9 months ago

But isn't that what it ends up doing? Isn't the fact that it reads the request body the reason that this app doesn't work?

import time
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/")
async def echo(request: Request):
    return StreamingResponse(request.stream())

if __name__ == "__main__":
    import httpx

    def upload_bytes():
        yield b"Hello, "
        time.sleep(1)
        yield b"world!"

    resp = httpx.post("http://localhost:8000/", content=upload_bytes())
    assert resp.content == b"Hello, world!", resp.content

This fails because the actual response is b''.