aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.08k stars 2.01k forks source link

ClientSession cookies does not work with IP #1183

Open imbolc opened 8 years ago

imbolc commented 8 years ago

Long story short

I use 127.0.0.1 instead of localhost for development and wasted few hours before understand why my tests breaks :)

import asyncio
import aiohttp

async def main():
    urls = [
        'http://httpbin.org/cookies/set?test=ok',
        'http://54.175.219.8/cookies/set?test=ok',
    ]
    for url in urls:
        with aiohttp.ClientSession() as s:
            async with s.get(url) as r:
                print(await r.json())
                print(list(s.cookie_jar))

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Actual behaviour

{'cookies': {'test': 'ok'}}
[<Morsel: test=ok; Domain=httpbin.org; Path=/>]
{'cookies': {}}
[]

Your environment

Last stable release of aiohttp, linux

asvetlov commented 8 years ago

It's intended behavior. Please use aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar(unsafe=True)) for enabling cookie processing for IP addresses.

imbolc commented 8 years ago

Thanks, it works :)

I see mention of this in the docs in CookieJar section. Actually I was looking for answer in the docs before I asked here, but I didn't found. Maybe it deserves to be mention in the main section of docs, because working with IP looks obvious and because this behavior differs from last release.

asvetlov commented 8 years ago

unsafe option was introduced by 0.22.3 as workaround for new aiohttp.CookieJar from 0.22.0.

Please make a PR for docs update. You know as aiohttp code writer I'm not the best person for finding quirks and subtle places in the documentation. Maybe unsafe flag should be mentioned not in the main section but in testing chapter?

imbolc commented 8 years ago

Yes, I can up docs. Could you please explain me little more about safety. Working with cookies in requests by IP itself does not seem unsafe. Because browsers, requests and aiohttp < 0.22.3 are doing this. Why should not this behavior stay default?

asvetlov commented 8 years ago

RFC 6265 declares in section 5.1.3. Domain Matching: A string domain-matches a given domain string if at least one of the following conditions hold:

Thus by the last rule IP addresses are not considered as safe source. Well, we could relax the rule but right now I want to keep it for sake of security. Most likely IP addresses are needed for testing use cases only.

P.S. Technically we may add all these options like unsafe, verify_ssl etc. into ClientSession ctor but it brings much more mess into the library maintenance.

imbolc commented 8 years ago

I understood, thank you :)

Besides the tests, IPs may be used widely in micro-service based architecture, clusters etc, to avoid any dns queries. But I think most important case we need to consider is implementing of libs. Now any open-source libs which use aiohttp client should change its APIs to pass unsafe param. And developers who will implement new libs should be known about it, to include the param to their API.

I'm not sure where is the right way: to be more secure by default or to work as user expect intuitively. I think requests is the source of expectations now, and it works with IPs by default.

asvetlov commented 8 years ago

Most likely you overestimate the problem. We may speculate a lot about how unsafe mode is important -- but the reality is: you have raised the first issue about IP addresses for more than two months of the feature existence.

requests has no own cookie jar, it uses standard python's implementation. Which in turn is very old, written in outmoded style, has bad test coverage and really not supported by core developers. It is the perfect example of death in stdlib. I prefer to follow RFC updates if possible. At least by default.

wood-j commented 5 years ago

It's intended behavior. Please use aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar(unsafe=True)) for enabling cookie processing for IP addresses.

thanks alooooooooooooooooooooot

kohtala commented 4 years ago

@asvetlov, I think the interpretation of RFC 6265 is not right.

IP addresses are safe according to the first rule: "The domain string and the string are identical."

That is sufficient. The other condition is about matching subdomains and there IP addresses are not safe. That condition fails, but the first sentence says "at least one of the ... conditions", so it does not matter.

I quote the section here to show the two levels of lists:

5.1.3. Domain Matching

A string domain-matches a given domain string if at least one of the following conditions hold:

  • The domain string and the string are identical. (Note that both the domain string and the string will have been canonicalized to lower case at this point.)

  • All of the following conditions hold:

    • The domain string is a suffix of the string.

    • The last character of the string that is not included in the domain string is a %x2E (".") character.

    • The string is a host name (i.e., not an IP address).

This would explain why everyone else does it differently.

Thanks for your work on aiohttp!

asvetlov commented 4 years ago

Domain string is not an IP address but the domain name, isn't it?

Anyway, the issue appears only if local server is tested IMHO. Passing unsafe for this case is pretty reasonable.

kohtala commented 4 years ago

The component processing defined in 5.1.3 is domain-matching with values as specified in 5.3 step 6 and 5.4 step 1.

The string is request-host as defined in 2.3 and domain-string is based on 4.1.1 syntax that says

domain-value = ; defined in [RFC1034], Section 3.5, as ; enhanced by [RFC1123], Section 2.1

where the RFC1123 section 2.1 reference allows IP addresses as domain strings.

... One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit.

aiohttp currently requires unsafe for RFC compliant behaviour. I fail to see what is unsafe about matching identical IP addresses. Rules in 5.3 and 5.4 make sure the cookie is only sent to origin server and nowhere else.

kohtala commented 3 years ago

I was hit by this again before remembering the restriction. Not all production hosts are in DNS, it would be useless to add to DNS when communication is machine-to-machine with static IP addresses. Some are isolated networks without DNS. I've even seen TLS certificates for IP addresses in use. aiohttp is useful in machine-to-machine.

@asvetlov, I hope this could be reopened.

Let me try to go through the RFC more clearly than before to show why I believe aiohttp is not working as intended by RFC 6265. I think the comment in https://github.com/aio-libs/aiohttp/issues/968#issuecomment-233256266 is inaccurate. Could @kxepal review and if he still believes so clarify exactly why.

The cookie received in Set-Cookie is stored according to RFC 6265 5.3 step 6.

5.3.  Storage Model
...
   6.   If the domain-attribute is non-empty:

           If the canonicalized request-host does not domain-match the
           domain-attribute:

              Ignore the cookie entirely and abort these steps.

           Otherwise:

              Set the cookie's host-only-flag to false.

              Set the cookie's domain to the domain-attribute.

        Otherwise:

           Set the cookie's host-only-flag to true.

           Set the cookie's domain to the canonicalized request-host.

When the domain-attribute is empty, there is no requirement to domain-match. Instead, it is host-only and cookies domain is the request-host.

When sending the Cookie, the 5.4 says

 1.  Let cookie-list be the set of cookies from the cookie store that
       meets all of the following requirements:

       *  Either:

             The cookie's host-only-flag is true and the canonicalized
             request-host is identical to the cookie's domain.

          Or:

             The cookie's host-only-flag is false and the canonicalized
             request-host domain-matches the cookie's domain.

       * ...

Since it is host-only, the first condition matches. Again, without domain-matching. In host-only case the domain is only limited by the request-host as defined in 2.3

2.3.  Terminology
...
   The request-host is the name of the host, as known by the user agent,
   to which the user agent is sending an HTTP request or from which it
   is receiving an HTTP response (i.e., the name of the host to which it
   sent the corresponding HTTP request).
...

This does not limit the way user agent knows the host, except to sufficiently identify it sends request to same host as where the response was received. Actually, this definition has room to allow HTTP Cookies used over unix sockets as well as for example over AMQP that identify the server host differently.

As agents use URL the host, as known by the user agent, refers to URL definition RFC 3986 3.2.2 that allows IP addresses as host.

If domain-attribute were not empty, the request-host would be domain-matched against domain-value. domain-value is defined in 4.1.1 that says

 domain-value      = <subdomain>
                       ; defined in [RFC1034], Section 3.5, as
                       ; enhanced by [RFC1123], Section 2.1

where the RFC1123 section 2.1 enhancement allows IP addresses to identify hosts.

   2.1  Host Names and Numbers

      ... One aspect of host name syntax is hereby changed: the
      restriction on the first character is relaxed to allow either a
      letter or a digit.  Host software MUST support this more liberal
      syntax.
...
      Whenever a user inputs the identity of an Internet host, it SHOULD
      be possible to enter either (1) a host domain name or (2) an IP
      address in dotted-decimal ("#.#.#.#") form.

Note that they took the trouble in RFC 6265 to bring out the IP Address enhancement to domain-value definition. Clearly they want IP addresses there.

The 5.1.3 domain-matching in turn says

5.1.3.  Domain Matching

   A string domain-matches a given domain string if at least one of the
   following conditions hold:

   o  The domain string and the string are identical.  (Note that both
      the domain string and the string will have been canonicalized to
      lower case at this point.)

   o  All of the following conditions hold:

      *  The domain string is a suffix of the string.

      *  The last character of the string that is not included in the
         domain string is a %x2E (".") character.

      *  The string is a host name (i.e., not an IP address).

It says at least one of the conditions. When the strings are identical, the second condition about not an IP addresses need not be held. Since that string is used to connect to the host, identical match uniquely identifies the host. Since they already specifically allowed IP addresses as domain-value, the third condition of second condition is there to protect against specifying domain-attribute that would be a suffix of an IP address.

RFC 2965 for Set-Cookie2 strongly discourages use of IP addresses, but the RFC 6265 that obsoleted it and is current seems to have removed that and it only mentions addresses in that one condition.

aiohttp currently requires unsafe for RFC compliant behaviour.

Other clients that send cookies to IP addresses are not violating RFC.

Since IPv6-only networks are becoming popular, I did some research into IPv6 and it seems at least Firefox fixed a bug in supporting it recently. Earlier there was similar issue in a HttpClient. None of the updates to RFC 1123 extended host name to IPv6, but since IPv6 host has a definition for URL in RFC 3968, and it clearly is the intention to allow IP addresses as host names and in domain attribute, it should be safe to take the URL host name format as a definition of the domain-value for IPv6. Note that there is also RFC 6874 to add the IPv6 zone identifiers into the host in URL. Browsers do not like the zone identifiers due to security concerns, but as aiohttp is used machine-to-machine, it could have some use.

Phew, if nothing else, the length of this should convince I see this as a problem.

Dreamsorcerer commented 3 years ago

Not all production hosts are in DNS, it would be useless to add to DNS when communication is machine-to-machine with static IP addresses.

Yes, but it's a bit unconventional to use cookies in this situation. I think this is definitely an edge case that very few users encounter.

aiohttp currently requires unsafe for RFC compliant behaviour.

I agree with the analysis, that there is no reason to hide this behind an 'unsafe' flag.

As this affects very few people, it's unlikely any of the maintainers are going to look at changing it. However, I think we'd accept a PR if someone wants to correct this behaviour.

kohtala commented 3 years ago

Thank you. I know. I've run into some REST API that also has web frontend that uses cookies and therefore authenticates with a cookie session. Another server uses cookie not for authentication, but to know which events the client has received and there also to support direct usage from a browser. So the cases have been that there is a second usage with browser, but machine-to-machine usage as well. Anyway, solutions are such that DNS is often not available, too unreliable or requires some IT effort that is too much trouble compared to what it is worth.

I do perfectly understand. Knowing you are open for PR is good enough. I'll see if I find the time, but if @imbolc, @wood-j or someone else have time, please let me know.

smheidrich commented 1 year ago

This just bit me in the posterior when I was testing cookie handling against a dummy server created using aiohttp's test utils: BaseTestServer.make_url returns an IP-address-based URL by default, so any cookies set by the dummy server are ignored.

The TestClient docs on the same page even mention that one should probably set cookie_jar=CookieJar(unsafe=True, treat_as_secure_origin="http://127.0.0.1"), but I wish this had been more prominent and mentioned in the howto above as well because it seems like a common pitfall.

Dreamsorcerer commented 1 year ago

Maybe we should make that the default CookieJar in TestClient (or atleast if the default host is used)...