defnull / multipart

Multipart parser for Python 3
Other
120 stars 33 forks source link

Consistently decode URL-encoded form data #30

Closed cjwatson closed 3 years ago

cjwatson commented 3 years ago

It's confusing for raw bytes and percent-encoded bytes in URL-encoded form data to be decoded using different character sets; this happened if passing a charset parameter other than "utf8" to parse_form_data.

https://url.spec.whatwg.org/#application/x-www-form-urlencoded intentionally doesn't cover non-UTF-8 cases, but it explicitly says that a parser should perform bytewise percent-decoding followed by UTF-8 decoding; the design of Python's parse_qs means that we have to do this the other way round, but nevertheless, if something other than UTF-8 decoding has been explicitly requested then it seems to fit the specification better to use the same character set for interpreting raw bytes and for interpreting percent-encoded sequences.

This came up when porting Launchpad to Python 3, because (I assume for historical reasons) zope.publisher deliberately interprets form data using ISO-8859-1 on round-tripping grounds and then re-decodes that to the preferred character set of the request, which is pretty confusing at the best of times but in particular went wrong when given form values that have been UTF-8-encoded and then percent-encoded.

defnull commented 3 years ago

Good find. The encoding parameter was added in 3.2 and is not present in 2.7 unfortunately. Yes, 2.7 is is EOL, but I'd like to keep compatibility if possible. Any ideas?

Or we could make a cut here, bump the version and drop support for Python < 3.6. I'd be happy with both.

cjwatson commented 3 years ago

I thought we already had dropped compatibility, and that's why the 0.1 branch exists. For instance, multipart.py has from urllib.parse import parse_qs near the top, which is specific to Python 3. And README.rst says:

* **0.2 (19.03.2019)**
  * Dropped support for Python versions below 3.6. Stay on 0.1 if you need Python 2.5+ support.
defnull commented 3 years ago

Ah, the top of the README tricked me ;) So, then, no objections.