abersheeran / a2wsgi

Convert WSGI app to ASGI app or ASGI app to WSGI app.
Apache License 2.0
226 stars 20 forks source link

ASGIMiddleware cannot handle multiple content-type or content-length environ entries #21

Closed RobbeSneyders closed 2 years ago

RobbeSneyders commented 2 years ago

If multiple content-type or content-length environ entries are present, the resulting headers created by ASGIMiddleware are not valid.

Eg. providing both

results in a duplicate content-length header by the following code:

https://github.com/abersheeran/a2wsgi/blob/9c7cc21d559aeeaf87b359cbf62373c1f1943202/a2wsgi/asgi.py#L59-L73

You can replicate this by wrapping a Flask application in both ASGI and WSGIMiddleware, and using its test_client which sets the double environ entries:

from a2wsgi import ASGIMiddleware, WSGIMiddleware
from flask import Flask, request

app = Flask(__name__)

app.wsgi_app = ASGIMiddleware(WSGIMiddleware(app.wsgi_app))

@app.post('/test')
def post():
    return {'headers': list(request.headers.items())}

client = app.test_client()

data = {'foo': 'bar'}
response = client.post('/test', json=data)
print(response.json)

Which results in:

{'headers': [..., ['Content-Type', 'application/json,application/json'], ['Content-Length', '14,14']]}

After the duplicate header values have been concatenated in the ASGIMiddleware here: https://github.com/abersheeran/a2wsgi/blob/9c7cc21d559aeeaf87b359cbf62373c1f1943202/a2wsgi/wsgi.py#L110-L122

Happy to submit a PR if it's clear which behavior should be implemented.

abersheeran commented 2 years ago

This does confuse me a bit.

I looked at two of the popular WSGI Server implementations and neither of them seem to generate HTTP_CONTENT_LENGTH / HTTP_CONTENT_TYPE.

https://github.com/benoitc/gunicorn/blob/1aae54a8e1a1b3022fbce42259b3bf36e2808f07/gunicorn/http/wsgi.py#L120-L139

https://github.com/Pylons/waitress/blob/84cfc2b8ac81b55a7223f51f6d807f634d98c0f9/src/waitress/task.py#L24-L27

Maybe we should investigate more WSGI Server implementations, and test clients you use.

RobbeSneyders commented 2 years ago

Indeed, I haven't found any other Python examples of this yet. A google search shows that it does occur across frameworks and languages.

Looking at the CGI RFC, this is what it says:

4.1.2. CONTENT_LENGTH

The CONTENT_LENGTH variable contains the size of the message-body attached to the request, if any, in decimal number of octets. If no data is attached, then NULL (or unset).

  CONTENT_LENGTH = "" | 1*digit

The server MUST set this meta-variable if and only if the request is accompanied by a message-body entity. The CONTENT_LENGTH value must reflect the length of the message-body after the server has removed any transfer-codings or content-codings.

4.1.18. Protocol-Specific Meta-Variables

The server SHOULD set meta-variables specific to the protocol and scheme for the request. ... Meta-variables with names beginning with "HTTP" contain values read from the client request header fields, if the protocol used is HTTP. The HTTP header field name is converted to upper case, has all occurrences of "-" replaced with "_" and has "HTTP" prepended to give the meta-variable name. ... The server is not required to create meta-variables for all the header fields that it receives. In particular, it SHOULD remove any header fields carrying authentication information, such as 'Authorization'; or that are available to the script in other variables, such as 'Content-Length' and 'Content-Type'.

While the CONTENT_LENGTH variable MUST be set, the HTTP_CONTENT_LENGTH variable SHOULD be removed.

So both should not, but can be present. Question is if you want to handle this.

I'll open a ticket on werkzeug as well to see if there is a reason not to follow the recommendation and possibly change it.

abersheeran commented 2 years ago

https://github.com/abersheeran/a2wsgi/blob/9c7cc21d559aeeaf87b359cbf62373c1f1943202/a2wsgi/asgi.py#L65

We can skip HTTP_CONTENT_LENGTH / HTTP_CONTENT_TYPE at here.