Pylons / pyramid

Pyramid - A Python web framework
https://trypyramid.com/
Other
3.97k stars 886 forks source link

CSRF assumes the token to be valid latin-1 #3763

Open hynek opened 4 months ago

hynek commented 4 months ago

Bug Report

Describe the bug Today Sentry reported a crash that I don't think I can do anything about (except adding custom checks) where someone (presumable an attacker) submitted a CSRF token that cannot be encoded at latin-1.

To Reproduce I'm currently traveling so can't come up with anything simple, but given that the check looks like this: https://github.com/Pylons/pyramid/blob/ef0f6861e5b439afe43983f6c7437c30a413a34d/src/pyramid/csrf.py#L43-L48

and bytes_ looks like this:

https://github.com/Pylons/pyramid/blob/ef0f6861e5b439afe43983f6c7437c30a413a34d/src/pyramid/util.py#L38-L43

It makes sense that if someone manages to sneak in a token that's not latin-1-encodable, it will crash with an UnicodeEncodeError.

I guess wrapping strings_differ into a try except UnicodeError this would fix it?

For completeness, the token in question were:

Unfortunately it's a bit difficult to trace what exactly happens, because Sentry removes everything with token in the name.

Expected behavior No crash.

Additional context

I'm pretty sure I'm not doing anything wrong; my app doesn't appear in the traceback except for tweens that don't touch the headers at all.

It's Pyramid 2.0.2 running in Unicorn 22.0.0 and

config.set_default_csrf_options(require_csrf=True)
config.set_csrf_storage_policy(CookieCSRFStoragePolicy())
mmerickel commented 4 months ago

I feel like this is a valid concern and it's natural to say that it should just be counted as not-equal versus raising an exception for an unusable value in the supplied csrf value. We should fix this.

ztane commented 1 month ago

Another approach would be to use errors='backslashreplace' for the supplied_token, then all non-representable characters would turn into Unicode escapes:

>>> '😀'.encode('latin-1', errors='backslashreplace')
b'\\U0001f600'