benoitc / gunicorn

gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.
http://www.gunicorn.org
Other
9.71k stars 1.74k forks source link

Gunicorn ignores the `Content-Length` header when the connection is half-closed. #3234

Open kenballus opened 2 months ago

kenballus commented 2 months ago

Description of the bug

From RFC 9112:

If a valid Content-Length header field is present without Transfer-Encoding, its decimal value defines the expected message body length in octets. If the sender closes the connection or the recipient times out before the indicated number of octets are received, the recipient MUST consider the message to be incomplete and close the connection.

Gunicorn does not enforce this rule. When it receives a request, and the sender half-closes the connection, Gunicorn responds regardless of whether the request's body has been fully received.

To reproduce

  1. Start a Gunicorn-based HTTP server that echos the message body. (e.g., something like this)
  2. Send it a request with an incomplete message body, followed by half-closing the socket, and observe that it still responds:
    printf 'GET / HTTP/1.1\r\nHost: a\r\nContent-Length: 10\r\n\r\nA' | nc localhost 80
    
    HTTP/1.1 200 OK
    Server: gunicorn
    Date: Thu, 04 Jul 2024 16:21:12 GMT
    Connection: keep-alive
    Content-type: application/json
    Content-Length: 133

{"headers":[["SE9TVA==","YQ=="],["Q09OVEVOVF9MRU5HVEg=","MTA="]],"body":"QQ==","version":"SFRUUC8xLjE=","uri":"Lw==","method":"R0VU"}

5. Decode the response to see that it accepted the incomplete message body:
```sh
printf '{"headers":[["SE9TVA==","YQ=="],["Q09OVEVOVF9MRU5HVEg=","MTA="]],"body":"QQ==","version":"SFRUUC8xLjE=","uri":"Lw==","method":"R0VU"}' \
    | jq '.["body"]' \
    | xargs echo \
    | base64 -d \
    | xxd
00000000: 41                                       A

Gunicorn version: 22.0.0 Python version: 3.11.9

benoitc commented 1 month ago

Body is streamed to the application. There is no reason to pass even incomplete header to the appplication. Otherwise that would force the gateway to buffer the whole body which is properly inneficient. The application beg-hind must take care of an incomplete body as a result. What do I miss there?

kenballus commented 1 month ago

Body is streamed to the application. There is no reason to pass even incomplete header to the appplication.

I don't know what you're trying to say here.

Otherwise that would force the gateway to buffer the whole body which is properly inneficient.

Obviously, the gateway shouldn't buffer the whole request. No one is suggesting that.

The application beg-hind must take care of an incomplete body as a result.

This just doesn't make sense. The application behind must not respond to the request until it is complete. If the connection is half-closed before the request is complete, then the application should close the connection. What it should not do is pretend that the message is complete.

If you're curious, this is what other HTTP servers do:

Close the connection without responding:

Close the connection and respond 400:

pajod commented 1 month ago

One existing test case even has what should be treated as truncated:

https://github.com/benoitc/gunicorn/blob/2d7eb3dc048584341654ddfea0dc7ea8c8e11428/tests/requests/valid/099.http#L10

benoitc commented 1 month ago

Contrary to some other wsgi implementations, the body is fully streamed to the application when it comes. That's per design and reduce the memory footprint. I would expect that the application validate the payload it received via the gateway before handling any action. This includes to check if its partial or not. Gunicorn has no control in the application once a chunk is given to it. This has never been à issue. Sometimes it is also useful to even handle partial content. Also when the Content length doesn't correspond gunicorn must close the connection.

However can you clarify what you mean by half closed? I don't see how a content can be returned if the socket is closed.

kenballus commented 1 month ago

gunicorn will certainly not buffer the body for the application until it's fully read. The body is fully streamed to the application when it comes. That's per design and reduce the memory footprint. I would expect that the application validate the payload it received via the gateway before handling any action. This has never been à issue.

No one is asking for Gunicorn to buffer the body for the application. This is entirely unrelated to what I'm talking about. As usual, it feels like we're talking past each other :)

Also when the Content length doesn't correspond gunicorn must close the connection.

When the Content-Length is too large for the received data, Gunicorn should treat the request as invalid, respond 400, and close the connection. It doesn't currently do this.

However can you clarify what you mean by half closed? I don't see how a content can be returned if the socket is closed.

A TCP connection is half-closed when one side has sent a FIN and the other side hasn't. The relevant section of the TCP RFC.

benoitc commented 1 month ago

When the Content-Length is too large for the received data, Gunicorn should treat the request as invalid, respond 400, and close the connection. It doesn't currently do

what do you mean by "the Content-Length is too large for the received data" ?

I am expectinv that the connection is closed by Gunicorn if it continue to receive some data passed the content length. If it's not the case this is a bug indeed.

Is this what you mean?

kenballus commented 1 month ago

I am expectinv that the connection is closed by Gunicorn if it continue to receive some data passed the content length. If it's not the case this is a bug indeed.

I mean the opposite of this. Consider the example request from the first message. The Content-Length value is 10, but the message body is only 1 byte long. This should elicit a 400 response, but it doesn't.

benoitc commented 1 month ago

I don't see how this could work . At that time we already passed some information to the application in the start_response handler 'gunicorn doesn't buffer). The application need to be aware we are closing this. Which is not planned with current expectation. Instead I would expect the application is taking care about what it read, not the gateway witch is only passing the data through to the application.