carsonyl / pypac

Find and use proxy auto-config (PAC) files with Python and Requests.
https://pypac.readthedocs.io
Apache License 2.0
71 stars 18 forks source link

Use a cache for domain lookup #58

Open Res260 opened 3 years ago

Res260 commented 3 years ago

Right now, only the proxy parsing gets cached: image

There should be a mechanism to not have to evaluate JS (costly) multiple time for the same domain name. I've been profiling my app which does a lot of requests, and this is a bottleneck.

carsonyl commented 3 years ago

This issue is tricky because of timeRange(). Though that function may be rarely used, its existence means caching to reduce JS calls could introduce incorrect behaviour. pac_context_for_url() is different approach to avoiding JS evaluation.

I did a quick performance check with the following notebook snippet:

%load_ext autoreload
%autoreload 2
from pypac.parser import PACFile
from pypac.resolver import ProxyResolver

pac = PACFile(
    """
function FindProxyForURL(url, host) {

  if (isPlainHostName(host) || dnsDomainIs(host, ".mydomain.com"))
    return "DIRECT";

  else if (shExpMatch(host, "*.com"))
    return "PROXY proxy1.mydomain.com:8080; " +
           "PROXY proxy4.mydomain.com:8080";

  else if (shExpMatch(host, "*.edu"))
    return "PROXY proxy2.mydomain.com:8080; " +
           "PROXY proxy4.mydomain.com:8080";

  else
    return "PROXY proxy3.mydomain.com:8080; " +
           "PROXY proxy4.mydomain.com:8080";
}
"""
)

resolver = ProxyResolver(pac)
#%%
%timeit resolver.get_proxy("http://example.com")

Results: