Open Dreamsorcerer opened 1 year ago
I profiled my program with yappi and found that .filter_cookies()
consumed 27.5% (23.1s/83.9s) of the total CPU time consumed by requests.
As we can see, the preparation before filtering is very expensive. https://github.com/aio-libs/aiohttp/blob/7ed2dd3793955736def36ff67044d19a43bdf4d5/aiohttp/cookiejar.py#L237-L252
However, not all requests will have cookies in their jar, for example, the initial request, or, when the session is only used to request those URLs that never sent cookies (images, videos, files, etc).
So I have another suggestion: test if there are any cookies in the jar before really doing anything.
Open PRs that probably resolve these performance issues: #7784 #7777 #7790
Open PRs that probably resolve these performance issues: #7784 #7777 #7790
I see. But they do not eliminate the need to call URL.origin()
, which is also expensive, even when the jar is empty. Would you think that my suggestion is a good idea? If so, I can open a PR.
If it's an easy change, feel free to make a PR, it's easier for me to evaluate the code.
I see. But they do not eliminate the need to call
URL.origin()
, which is also expensive, even when the jar is empty. Would you think that my suggestion is a good idea? If so, I can open a PR.
I see origin
being expensive in the profile as well. Its much more expensive if its an ip address instead of a hostname because it has to recreate the ip_address object. I think you'll need to do another PR for that one
It would be nice if we had a simple benchmark script to compare before and after changes for the cookie jar (probably the url dispatcher as well).
The cookie jar and the url dispatcher tend to be the bottlenecks for large aiohttp installs so anything we can do to improve them will make things scale much better.
This is a benchmarks repo, which I've not looked at yet, maybe if that is dusted off it can be used? https://github.com/aio-libs/aiohttp-benchmarks
It looks like those are mostly? end-to-end benchmarks. Since we already know where the bottlenecks are, I'd be more interested in something that adds 10000 cookies to the cookie jar and does timing on how long it takes to call filter_cookies
. Probably one should have an ip address in the url, and one should have a hostname.
For the url dispatcher add 5000 resources and see how much time it takes to dispatch to the to the last one in the list vs the first one in the list.
While looking through #7577, we found a few details that could possibly be improved.
self._cookies[(domain, path)][name]
, therefore we could do domain-matching and path-matching on the keys, instead of testing every single cookie. https://github.com/aio-libs/aiohttp/blob/4639b36852f3b4939e61311d7863f1d7450d500d/aiohttp/cookiejar.py#L141-L142