django / channels

Developer-friendly asynchrony for Django
https://channels.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6.06k stars 801 forks source link

CookieMiddleware should decode() using `latin-1`. #1450

Closed vaisaghvt closed 3 years ago

vaisaghvt commented 4 years ago

We have been periodically getting this error stack trace in our logs:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 745: ordinal not in range(128)
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
  File "/usr/local/lib/python3.7/site-packages/daphne/server.py", line 201, in create_application
  File "/usr/local/lib/python3.7/site-packages/channels/routing.py", line 54, in __call__
  File "/usr/local/lib/python3.7/site-packages/channels/security/websocket.py", line 37, in __call__
  File "/usr/local/lib/python3.7/site-packages/channels/sessions.py", line 41, in __call__

When investigating, I found out that the issue is at this line: https://github.com/django/channels/blob/dd304566775d6f1ce3d40a4d6e2a100f33c54889/channels/sessions.py#L41 when there is a cookie that has a non-ascii value.

Is there a reason that it is assumed that the cookie value will not have non-ascii values? If so, what would be the work around?

If not, is there an issue with changing this?

For example, would it be safe to do this: cookies = parse_cookie(value.decode("ascii", errors='ignore'))

hwooson commented 4 years ago

I have the same issue using the version of 2.3.0.

carltongibson commented 4 years ago

As far as I can tell cookies are meant to only contain US-ASCII characters...

e.g. https://tools.ietf.org/id/draft-ietf-httpstate-cookie-23.html#sane-set-cookie-syntax

cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                      ; US-ASCII characters excluding CTLs,
                      ; whitespace DQUOTE, comma, semicolon,
                      ; and backslash

Searching doesn't point to anything particularly definitive.

Will close: happy to re-open if more info comes up.

carltongibson commented 4 years ago

Some historical discussion on Gunicorn here: https://github.com/benoitc/gunicorn/issues/1778

Eventually looks like they eased to latin-1. https://github.com/benoitc/gunicorn/pull/1914/files

Maybe we should do that.

carltongibson commented 4 years ago

PEP 333:

Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.

So it looks like latin-1 is correct, even if folks SHOULD use ascii.

carltongibson commented 4 years ago

Todo:

Dhruv-Sachdev1313 commented 3 years ago

@carltongibson Hey is this issue still pending ?

carltongibson commented 3 years ago

Hi @Dhruv-Sachdev1313 — looks like it's handled in #1494. If you want to review that it would be great.

Dhruv-Sachdev1313 commented 3 years ago

Okay @carltongibson . Thanks !!