encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
13.09k stars 834 forks source link

Slow performance when merging cookies #2875

Open gyula-lakatos opened 1 year ago

gyula-lakatos commented 1 year ago

There is a quite significant performance degradation when cookies are being used with requests.

httpx._client.BaseClient._merge_cookies calls into the cookiejar here:

        if cookies or self.cookies:
            merged_cookies = Cookies(self.cookies)
            merged_cookies.update(cookies)
            return merged_cookies
        return cookies

When either cookies or self.cookies is not None, then httpx._models.Cookies.__bool__ will call into a method called deepvalues in http.cookiejar.CookieJar by indirectly calling http.cookiejar.CookieJar.__iter__.

httpx._models.Cookies.__bool__:

    def __bool__(self) -> bool:
        for _ in self.jar:
            return True
        return False

http.cookiejar.CookieJar.__iter__:

    def __iter__(self):
        return deepvalues(self._cookies)

Deepvalues is a significant performance hog because it does a lot of things recursively.

A suggested solution is to check cookies and self.cookies for None instead of using __bool__.

I dropped a performance snapshot to illustrate the problem:

image

Here deepvalues takes up almost 17% of the CPU time.

tomchristie commented 11 months ago

Pulling in the stdlib http.cookiejar and urllib.request modules brings in quite a lot of submerged complexity... I'd wonder if we really need that for the httpx.Cookies implementation.

vergeev commented 11 months ago

Could we instead be using a plain dictionary format under-the-hood for the cookie persistence?

I think the biggest obstacle here is the fact that the jar attribute of Cookies class is a part of a public interface: https://github.com/encode/httpx/blob/fbe35add82032d119d60c7f4de9bce2ccb12f6a1/docs/api.md?plain=1#L154

Users may want to access it in order to customize the cookie policy (which is set during initialization and accessed when adding a header). So a compatibility layer may be warranted, and it can get complex.

Other than enforcing a cookie policy, the CookieJar appears to be mostly concerned about thread-safety, and it could be replicated in the Cookies class.

Which cookie attributes does httpx honor? For example, does "expires" actually take affect?

It's domain, path and expires. The domain and path are tested, and expires is enforced at the CookieJar level. Each time httpx.Cookies.set_cookie_header is called, it calls http.cookiejar.CookieJar. add_cookie_header, which clears the expired cookies: https://github.com/python/cpython/blob/24b5cbd3dce3fe37cdc787ccedd1e73a4f8cfc3c/Lib/http/cookiejar.py#L1387.

T-256 commented 6 days ago

With #3322 I removed _merge_cookies. There is another place in httpx.Request that bool check for cookies. @gyula-lakatos Do you able to reproduce with #3322 ?