Closed cjwatson closed 3 years ago
Good find. The encoding
parameter was added in 3.2 and is not present in 2.7 unfortunately. Yes, 2.7 is is EOL, but I'd like to keep compatibility if possible. Any ideas?
Or we could make a cut here, bump the version and drop support for Python < 3.6. I'd be happy with both.
I thought we already had dropped compatibility, and that's why the 0.1 branch exists. For instance, multipart.py has from urllib.parse import parse_qs
near the top, which is specific to Python 3. And README.rst says:
* **0.2 (19.03.2019)**
* Dropped support for Python versions below 3.6. Stay on 0.1 if you need Python 2.5+ support.
Ah, the top of the README tricked me ;) So, then, no objections.
It's confusing for raw bytes and percent-encoded bytes in URL-encoded form data to be decoded using different character sets; this happened if passing a
charset
parameter other than "utf8" toparse_form_data
.https://url.spec.whatwg.org/#application/x-www-form-urlencoded intentionally doesn't cover non-UTF-8 cases, but it explicitly says that a parser should perform bytewise percent-decoding followed by UTF-8 decoding; the design of Python's
parse_qs
means that we have to do this the other way round, but nevertheless, if something other than UTF-8 decoding has been explicitly requested then it seems to fit the specification better to use the same character set for interpreting raw bytes and for interpreting percent-encoded sequences.This came up when porting Launchpad to Python 3, because (I assume for historical reasons) zope.publisher deliberately interprets form data using ISO-8859-1 on round-tripping grounds and then re-decodes that to the preferred character set of the request, which is pretty confusing at the best of times but in particular went wrong when given form values that have been UTF-8-encoded and then percent-encoded.