Open datawookie opened 3 years ago
The link to your blog post should be: https://datawookie.dev/blog/2021/10/medusa-multi-headed-tor-proxy/ (instead of pointing to localhost) ;) Great work btw!
Thanks, @kaybeudeker, I've updated the URL. Appreciate you bringing that to my attention.
Have you tried this out? I'd really appreciate any feedback.
I had a similar use case to read proxies from an URL (specifically an API call to a third party which returns a list of proxies - exactly like you have) - I created a small utility function which uses requests.get
to fetch the proxies and assigns the result to ROTATING_PROXY_LIST_PATH
in settings.py
.
utility function:
`def get_proxies(proxy_json_end_point: str) -> List[str]: r = requests.get(proxy_json_end_point) proxies = r.json()
proxy_urls = [
f"http://{user}:{pwd}@{host_port}"
for (host_port, user, pwd) in [p.split(";") for p in proxies]
]
random.shuffle(proxy_urls)
print("Proxies:", proxy_urls)
return proxy_urls`
settings.py
ROTATING_PROXY_LIST = get_act_proxies(os.getenv("PROXY_JSON_ENDPOINT"))
note - the PROXY_JSON_ENDPOINT env variable points to the third-party's API endpoint which returns the proxies. I used a similar approach to even fetch proxies listed in text file hosted in S3.
Hi @TeamHG-Memex, any progress on this? This PR has been languishing for a few months now. Thanks, Andrew.
Hi!
We build a lot of web scrapers using Scrapy and I've been using your package for a while now. It's great for managing our multi-proxy setup.
We have been developing a proxy system that shares the proxy list via a URL. I have been dumping the contents of that URL to a file so that I can read it in via
ROTATING_PROXY_LIST_PATH
but this has become a bit of a pain. It occurred to me that it should be possible to read the proxy list from an URL.The merge request includes a simple change to the
RotatingProxyMiddleware.from_crawler()
method to make that possible.Example: Sharing proxy list at http://127.0.0.1:8800.
In
settings.py
I then have:For context, here's a blog post about the proxy system that we are using in conjunction with
scrapy-rotating-proxies
.Best regards, Andrew.