Open nburns opened 11 months ago
looks like the issue might stem from the usage of SimpleCookie
in the standard lib
> ipython
In [1]: from http import cookies
In [2]: c = cookies.SimpleCookie()
In [3]: c.load('foo=bar; baz="qux; oh=no;')
In [4]: c.items()
Out[4]: dict_items([('foo', <Morsel: foo=bar>)])
edit: found the python bug: https://github.com/python/cpython/issues/92936
Possibly, though we also do something with handling quoted cookies ourselves, as per #5397. I don't remember the details though.
@Dreamsorcerer I updated my original description, the real issue is that you lose the cookies (silently with no error) that appear after a cookie with a value including a double quote in a request.
Sort of a show stopper bug in practice if you use/depend on cookie based client sessions in your web app, because (we saw this especially with chrome, chrome seems to send more cookies than say safari) if any cookie with a double quote gets set and appears before your session cookie it will break sessions for that client. (And it's pretty tricky to reproduce/discover if you're unaware)
Yes, I understood the issue, just lacking time to investigate if it relates to the code mentioned in the other issue, or if it's purely an issue upstream.
if any cookie with a double quote gets set
Presumably, only a cookie with a single (or odd?) number of quotes, maybe even only a cookie which starts with a quote and doesn't include a second one.
In my testing it's any number of quotes in any position in the cookie value
My point was meant to mean that it must handle quoted cookies fine: 'foo=bar; baz="qux"; oh=no;'
In other words, it only fails on invalid cookie values. Given they are invalid, it could be difficult to clearly define the correct behaviour.
e.g. In your original post you say that Flask correctly parses 3 cookies from that string, but couldn't the intention have been for 2 cookies ({'foo': 'bar', 'baz': '"qux; oh=no'}
)?
However, the RFC would actually treat this as 3 values. So, feel free to make a bug report upstream, as it does say that user agents MUST use that algorithm: https://www.rfc-editor.org/rfc/rfc6265.html#section-5.2
I think the original implementation was following https://datatracker.ietf.org/doc/html/rfc2109.html Which has now been obsoleted by the above mentioned RFC (and another before that).
I've reread that spec, and it appears I was wrong. The algorithm I was looking at is for parsing the set-cookie header.
For the Cookie header, the RFC only states that such cookies are invalid (cookie values can't contain "
). So, I think we are technically compliant with the spec either way this works.
I suggest you file bugs against whatever software is producing these invalid cookies. If cpython takes the PR, that'll be good to have the extra compatibility, but it's technically compliant either way.
We can argue about specs and what should be doing what all day, but the fact remains that unexpected input in cookies from servers/services that developers don't control silently breaks aiohttp's cookie parsing in an extremely confusing way.
There are plenty of other python and non python webservers that are more robust in this case. webob for example handles them just fine: https://github.com/Pylons/webob/blob/main/src/webob/cookies.py#L158 as do reverse proxies line nginx and web browsers in general.
I've been able to patch around this issue by overriding aiohttp's cookie parsing:
class Request(aiohttp.web_request.Request):
@aiohttp.web_request.reify
def cookies(self):
raw = self.headers.get(aiohttp.web_request.hdrs.COOKIE, "")
parsed_all = webob.cookies.parse_cookie(raw)
parsed = { k.decode("utf-8"): v for k, v in parsed_all }
return MappingProxyType(parsed)
class Application(aiohttp.web.Application):
def _make_request(self, message, payload, protocol, writer, task, _cls=Request):
return super()._make_request(message, payload, protocol, writer, task, _cls=_cls)
Following my latest summary, we could arguably improve it slightly by splitting on ;
and feeding the substrings to SimpleCookie individually (some cookie values would still get lost though). But, it would be better to have this done upstream..
Nice idea, maybe combined with a log.error
if the value is lost?
Perhaps a warning or info level log, sure. If there's no further movement on the issue upstream, then feel free to make a PR with those changes here.
Describe the bug
Cookie values after a double quote
"
are not parsed. All subsequent cookies in the request are silently dropped.To Reproduce
async def hello(request): return web.Response(text=str(request.cookies))
app = web.Application() app.add_routes([web.get("/", hello)])
web.run_app(app)
the bug:
Python Version
aiohttp Version
multidict Version
yarl Version
OS
macos 14.1.1 (23B81)
Related component
Server
Additional context
We recently deployed a service using aiohttp and noticed that some users would be "logged in" i.e. have a valid session by our main server (webapp2), and not be logged in on our async aiohttp server. It was really hard to narrow down the specfic issue but eventually we found that a requests with a cookie value containing json before our session cookie would cause the user to appear logged out to our aiohttp server
other servers handle the data more robustly for example flask, will correctly parse the 2nd cookie and cookies after it:
Code of Conduct