aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.18k stars 2.02k forks source link

CookieJar improvements #7583

Open Dreamsorcerer opened 1 year ago

Dreamsorcerer commented 1 year ago

While looking through #7577, we found a few details that could possibly be improved.

Rongronggg9 commented 1 year ago

I profiled my program with yappi and found that .filter_cookies() consumed 27.5% (23.1s/83.9s) of the total CPU time consumed by requests.

As we can see, the preparation before filtering is very expensive. https://github.com/aio-libs/aiohttp/blob/7ed2dd3793955736def36ff67044d19a43bdf4d5/aiohttp/cookiejar.py#L237-L252

However, not all requests will have cookies in their jar, for example, the initial request, or, when the session is only used to request those URLs that never sent cookies (images, videos, files, etc).

So I have another suggestion: test if there are any cookies in the jar before really doing anything.

Dreamsorcerer commented 1 year ago

Open PRs that probably resolve these performance issues: #7784 #7777 #7790

Rongronggg9 commented 1 year ago

Open PRs that probably resolve these performance issues: #7784 #7777 #7790

I see. But they do not eliminate the need to call URL.origin(), which is also expensive, even when the jar is empty. Would you think that my suggestion is a good idea? If so, I can open a PR.

Dreamsorcerer commented 1 year ago

If it's an easy change, feel free to make a PR, it's easier for me to evaluate the code.

bdraco commented 1 year ago

I see. But they do not eliminate the need to call URL.origin(), which is also expensive, even when the jar is empty. Would you think that my suggestion is a good idea? If so, I can open a PR.

I see origin being expensive in the profile as well. Its much more expensive if its an ip address instead of a hostname because it has to recreate the ip_address object. I think you'll need to do another PR for that one

bdraco commented 1 year ago

It would be nice if we had a simple benchmark script to compare before and after changes for the cookie jar (probably the url dispatcher as well).

The cookie jar and the url dispatcher tend to be the bottlenecks for large aiohttp installs so anything we can do to improve them will make things scale much better.

Dreamsorcerer commented 1 year ago

This is a benchmarks repo, which I've not looked at yet, maybe if that is dusted off it can be used? https://github.com/aio-libs/aiohttp-benchmarks

bdraco commented 1 year ago

It looks like those are mostly? end-to-end benchmarks. Since we already know where the bottlenecks are, I'd be more interested in something that adds 10000 cookies to the cookie jar and does timing on how long it takes to call filter_cookies. Probably one should have an ip address in the url, and one should have a hostname.

For the url dispatcher add 5000 resources and see how much time it takes to dispatch to the to the last one in the list vs the first one in the list.