Closed flipmcf closed 3 years ago
in 2.0.0 this would break: https://github.com/Pylons/waitress/blob/c2980c107a372f635e307ae11d5ac33c2ba57c13/src/waitress/task.py#L279
Why use latin-1? can we use utf-8?
HTTP headers as a general rule only support US-ASCII. PEP 3333 allows latin-1, as quoted below:
Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.
What I've done historically for content-disposition is used the unidecode package to convert utf-8 encoded filenames into something that fits. Unfortunately this isn't really something that waitress is doing incorrectly AFAIK.
We could try to provide better support for the RFC2047-formatted strings as you noted, but I'm not sure it'll be worth it, I have no idea what the browser support is like for that and I suspect it's better to just stick to US-ASCII.
Thank you for the reply. I agree that this is not the job of waitress. The filename should be properly encoded before waitress is involved.
My specific 'bug' (using the term loosely) was aggravated when a user 'cleverly' used unicode for a filename.
When using plone to
@@stream
a file, Debug shows everything is fine, except byte-encoding fails because the filename contains bytes outside the latin-1 character set: " VISA.mp3 "Changing that line to
return bytes(s, "utf-8")
makes it work just fine, but I'm not sure if it breaks some kind of HTTP Rule.Is simply changing to utf-8 a good fix?
Edit: I may have opened a can of worms. looks like, at least for my edgy case: https://tools.ietf.org/html/rfc2184 says to do something like:
filename*=utf-8'zh-cn'\xef\xbc\xb6\xef\xbc\xa9\xef\xbc\xb3\xef\xbc\xa1.mp3
I'd much rather tell my users to use ascii only filenames, but I do think we should at least guard against the exception somehow. Encoding the entire HTTP Response to utf-8 seems to help more than hurt.