Closed n0w4k closed 3 years ago
Hi :wave:,
Thanks for using Falcon. The large amount of time and effort needed to maintain the project and develop new features is not sustainable without the generous financial support of community members like you.
Please consider helping us secure the future of the Falcon framework with a one-time or recurring donation.
Thank you for your support!
Hi @n0w4k ! And thanks for filing this.
I'll schedule this for 3.1 to ensure we make a decision on this.
I've tried digging into this quickly. It seems that, for instance, WSGI (PEP 3333) implies that headers should be treated as latin1
(ISO-8859-1
), OTOH it's clearly articulated only for response headers, not request header parsing.
The ASGI spec conveniently evades this issue by simply providing byte headers, as discussed in your description.
The popular Gunicorn server also opts to treat headers as latin1
(ISO-8859-1
) encoded, see its bytes_to_str()
function.
In that light, I'm inclined to think that we probably don't want to support custom header encodings, but OTOH, it is a bug in Falcon's asgi.App
implementation that we assume UTF-8, and not ISO-8859-1
.
@kgriffs thoughts?
@n0w4k Just to confirm, would defaulting to ISO-8859-1
solve your problem at hand, or are you dealing with other incompatible encodings as well?
Thanks for the reply, @vytas7. I can confirm that all the bytes mentioned in the UnicodeDecodeError
exceptions that we checked (mostly umlauts) could be decoded using ISO-8859-1
, so yes, I am pretty optimistic that defaulting to it would solve our problem. Furthermore, we never had this issue using WSGI (using Falcon 2).
Right, thanks!
It looks like almost all RFCs, WSGI servers and ASGI frameworks assume latin1
by default, this was simply an oversight on our end. I've rescheduled this to the first bugfix release (we have another edge case incompatibility already slated for the release in question).
Currently, headers in Falcon's ASGI package will be decoded using the Python's default UTF-8 decoding. For instance: https://github.com/falconry/falcon/blob/933851183ed9cce2d3411f890dca3e19385480f6/falcon/asgi/request.py#L847 and https://github.com/falconry/falcon/blob/933851183ed9cce2d3411f890dca3e19385480f6/falcon/asgi/request.py#L897
When headers have an encoding different from UTF-8, this leads to
UnicodeDecodeError
. An option to use different charsets here would be much appreciated.Even though headers are supposed to be plain ASCII where it doesn't matter, in reality, there are still headers sent which contain characters especially from the ISO-8859-1 charset. Compare also RFC 7230, section 3.2.4: