Closed dahlia closed 7 years ago
This seems like a good idea for fitting in with existing workarounds.
FWIW: you could already do this via environ["aiohttp.request"].raw_path
.
I've found this when researching for https://github.com/django/asgiref/issues/87.
It appears to be a bit inconsistent with query info though. The following test passes:
diff --git i/tests/test_environ.py w/tests/test_environ.py
index 3c5ac6d..2499191 100644
--- i/tests/test_environ.py
+++ w/tests/test_environ.py
@@ -89,6 +89,12 @@ def assert_environ_quoted_path_info(environ):
assert environ['RAW_URI'] == "/%ED%85%8C%2F%EC%8A%A4%2F%ED%8A%B8"
assert environ['REQUEST_URI'] == "/%ED%85%8C%2F%EC%8A%A4%2F%ED%8A%B8"
+@environ_application
+def assert_environ_quoted_path_info_with_query(environ):
+ assert environ['PATH_INFO'] == "/foo/bar"
+ assert environ['RAW_URI'] == "/foo%2Fbar?query/info"
+ assert environ['REQUEST_URI'] == "/foo%2Fbar?query/info"
+
class EnvironTest(AsyncTestCase):
@@ -133,3 +139,7 @@ def testEnvironRootSubdirTrailing(self):
def testQuotedPathInfo(self):
with self.run_server(assert_environ_quoted_path_info) as client:
client.assert_response(path="/%ED%85%8C%2F%EC%8A%A4%2F%ED%8A%B8")
+
+ def testQuotedPathInfoWithQuery(self):
+ with self.run_server(assert_environ_quoted_path_info_with_query) as client:
+ client.assert_response(path="/foo%2Fbar?query%2Finfo")
I.e. the query string is percent-decoded here.
This is just a drive-by comment, but I think it would be good to document this via some test at least.
As for ASGI (https://github.com/django/asgiref/issues/87) I think it makes sense to have a raw_path
without the additional query string (which is not unquoted there).
So, you're saying that RAW_URI contains the quoted path, but the unquoted query string. That seems pretty weird. I think aiohttp-wsgi should change this behaiour.
Based on your research, would it be more standard to:
/foo%2Fbar
)./foo%2Fbar?query%2Finfo
).I'm inclined to think (1.) is correct.
This patch adds non-standard fields named
RAW_URI
andREQUEST_URI
to WSGIenviron
. This solves the following problem.According to PEP 3333, WSGI requires to unquote
PATH_INFO
before it's passed to an application, as CGI did. Unfortunately, it makes some unrecoverable loss of its raw request from a client.For example, if a request was
GET /A%2FB
, the%2F
sequence is unquoted to/A/B
.There's one more difficulty about
PATH_INFO
when it comes to Python 3. As WSGI'sPATH_INFO
environ should be a Unicodestr
, percent-encoded sequence (which is encoding of bytes, rather than Unicode characters) needs to be interpreted in a certain encoding — aiohttp-wsgi indeed interprets them in UTF-8 encoding. It means that if a client requests a path containing some bytes encoded in non-UTF-8 the requested path (e.g./%C5%D7%BD%BA%C6%AE
) will be lost in an unrecoverable way.Further reading: WSGI Amendments thoughts: the horror of charsets (Web-SIG).
Since this problem has been quite well known to WSGI implementers, several widely-used WSGI servers provide some non-standard workarounds. uWSGI and Apache mod_wsgi provides
REQUEST_URI
. Gunicorn providesRAW_URI
. These fields contain an untouched and uninterpreted path string. Some WSGI frameworks or applications use it (though they fallback toPATH_INFO
if these fields don't exist). This patch chose these field names rather than introducing yet another non-standard field.