aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.15k stars 2.02k forks source link

aiohttp.web fails to parse cookies after a cookie with quotes #7993

Open nburns opened 11 months ago

nburns commented 11 months ago

Describe the bug

Cookie values after a double quote " are not parsed. All subsequent cookies in the request are silently dropped.

To Reproduce

  1. setup the following simple aiohttp server:
    
    #!/usr/bin/env python
    from aiohttp import web
    import json

async def hello(request): return web.Response(text=str(request.cookies))

app = web.Application() app.add_routes([web.get("/", hello)])

web.run_app(app)

2. run the server
3. make a request like this which has cookies with a double quote in the value: `curl http://localhost:8080 -H 'Cookie: baz="qux; foo=bar;'`
4. notice that the response body/cookies are empty not `{'baz'='"qux', 'foo': 'bar'}`

### Expected behavior

cookie value would be parsed and returned with a double quote in the value, subsequent cookies would also not be silently dropped

### Logs/tracebacks
correct behavior
```sh
> curl http://localhost:8080 -H 'Cookie: foo=bar;'
{'foo': 'bar'}
> curl http://localhost:8080 -H 'Cookie: foo=bar; baz=qux;'
{'foo': 'bar', 'baz': 'qux'}
> curl http://localhost:8080 -H 'Cookie: foo=bar; baz=qux; foo2=bar2'
{'foo': 'bar', 'baz': 'qux', 'foo2': 'bar2'}

the bug:

> curl http://localhost:8080 -H 'Cookie: foo=bar; baz="qux; foo2=bar2'
{'foo': 'bar'}

Python Version

$ python --version
Python 3.11.5

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.1
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache 2
Location: /Users/nick/.asdf/installs/python/3.11.5/lib/python3.11/site-packages
Requires: aiosignal, attrs, frozenlist, multidict, yarl
Required-by: openai

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/nick/.asdf/installs/python/3.11.5/lib/python3.11/site-packages
Requires:
Required-by: aiohttp, yarl

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.4
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache-2.0
Location: /Users/nick/.asdf/installs/python/3.11.5/lib/python3.11/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

macos 14.1.1 (23B81)

Related component

Server

Additional context

We recently deployed a service using aiohttp and noticed that some users would be "logged in" i.e. have a valid session by our main server (webapp2), and not be logged in on our async aiohttp server. It was really hard to narrow down the specfic issue but eventually we found that a requests with a cookie value containing json before our session cookie would cause the user to appear logged out to our aiohttp server

other servers handle the data more robustly for example flask, will correctly parse the 2nd cookie and cookies after it:

#!/usr/bin/env python3

from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def hello():
    return str(request.cookies)

def main():
    app.run(debug=True)

if __name__ == '__main__':
    main()

Code of Conduct

nburns commented 11 months ago

looks like the issue might stem from the usage of SimpleCookie in the standard lib

> ipython

In [1]: from http import cookies

In [2]: c = cookies.SimpleCookie()

In [3]: c.load('foo=bar; baz="qux; oh=no;')

In [4]: c.items()
Out[4]: dict_items([('foo', <Morsel: foo=bar>)])

edit: found the python bug: https://github.com/python/cpython/issues/92936

Dreamsorcerer commented 11 months ago

Possibly, though we also do something with handling quoted cookies ourselves, as per #5397. I don't remember the details though.

nburns commented 10 months ago

@Dreamsorcerer I updated my original description, the real issue is that you lose the cookies (silently with no error) that appear after a cookie with a value including a double quote in a request.

Sort of a show stopper bug in practice if you use/depend on cookie based client sessions in your web app, because (we saw this especially with chrome, chrome seems to send more cookies than say safari) if any cookie with a double quote gets set and appears before your session cookie it will break sessions for that client. (And it's pretty tricky to reproduce/discover if you're unaware)

Dreamsorcerer commented 10 months ago

Yes, I understood the issue, just lacking time to investigate if it relates to the code mentioned in the other issue, or if it's purely an issue upstream.

if any cookie with a double quote gets set

Presumably, only a cookie with a single (or odd?) number of quotes, maybe even only a cookie which starts with a quote and doesn't include a second one.

nburns commented 10 months ago

In my testing it's any number of quotes in any position in the cookie value

Dreamsorcerer commented 10 months ago

My point was meant to mean that it must handle quoted cookies fine: 'foo=bar; baz="qux"; oh=no;' In other words, it only fails on invalid cookie values. Given they are invalid, it could be difficult to clearly define the correct behaviour.

e.g. In your original post you say that Flask correctly parses 3 cookies from that string, but couldn't the intention have been for 2 cookies ({'foo': 'bar', 'baz': '"qux; oh=no'})?

However, the RFC would actually treat this as 3 values. So, feel free to make a bug report upstream, as it does say that user agents MUST use that algorithm: https://www.rfc-editor.org/rfc/rfc6265.html#section-5.2

Dreamsorcerer commented 10 months ago

I think the original implementation was following https://datatracker.ietf.org/doc/html/rfc2109.html Which has now been obsoleted by the above mentioned RFC (and another before that).

Dreamsorcerer commented 2 months ago

I've reread that spec, and it appears I was wrong. The algorithm I was looking at is for parsing the set-cookie header.

For the Cookie header, the RFC only states that such cookies are invalid (cookie values can't contain "). So, I think we are technically compliant with the spec either way this works.

I suggest you file bugs against whatever software is producing these invalid cookies. If cpython takes the PR, that'll be good to have the extra compatibility, but it's technically compliant either way.

nburns commented 2 months ago

We can argue about specs and what should be doing what all day, but the fact remains that unexpected input in cookies from servers/services that developers don't control silently breaks aiohttp's cookie parsing in an extremely confusing way.

There are plenty of other python and non python webservers that are more robust in this case. webob for example handles them just fine: https://github.com/Pylons/webob/blob/main/src/webob/cookies.py#L158 as do reverse proxies line nginx and web browsers in general.

I've been able to patch around this issue by overriding aiohttp's cookie parsing:

class Request(aiohttp.web_request.Request):
    @aiohttp.web_request.reify
    def cookies(self):
        raw = self.headers.get(aiohttp.web_request.hdrs.COOKIE, "")
        parsed_all = webob.cookies.parse_cookie(raw)
        parsed = { k.decode("utf-8"): v for k, v in parsed_all }
        return MappingProxyType(parsed)

class Application(aiohttp.web.Application):
    def _make_request(self, message, payload, protocol, writer, task, _cls=Request):
        return super()._make_request(message, payload, protocol, writer, task, _cls=_cls)
Dreamsorcerer commented 2 months ago

Following my latest summary, we could arguably improve it slightly by splitting on ; and feeding the substrings to SimpleCookie individually (some cookie values would still get lost though). But, it would be better to have this done upstream..

nburns commented 2 months ago

Nice idea, maybe combined with a log.error if the value is lost?

Dreamsorcerer commented 2 months ago

Perhaps a warning or info level log, sure. If there's no further movement on the issue upstream, then feel free to make a PR with those changes here.