Open Ostapp opened 5 years ago
For anyone who still wants the answer to this, Yes it assigns a new user agent to each request. You can refer to exactly how here https://pypi.org/project/fake-useragent/
but tldr, you can use the RANDOM_UA_TYPE setting (which defaults to random)
and the middleware will generate a new user agent string for each request based on the above criteria.
Thank @alecxe for providing this great project.
For scrapy-proxies, I wonder what do you mean by set RANDOM_UA_PER_PROXY
to be true?
To use with middlewares of random proxy such as scrapy-proxies, you need:
set RANDOM_UA_PER_PROXY to True to allow switch per proxy
set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA
Do I need to first pip install scrapy_proxies
, and then I add RANDOM_UA_PER_PROXY = True
in my setting.py? or it is already included and I can add RANDOM_UA_PER_PROXY = True
directly.
Also, for scrapy_proxies
priority, do I need to add another DOWNLOADER_MIDDLEWARE
for scrapy-proxies? I mean there will be two DOWNLOADER_MIDDLEWARES
, respectively. And I then just set two priorities of fake-useragent are larger than scrapy-proxies, so I can have proxy + fake user agent together?
Because you mentioned fake user agent needs to turn off built-in UserAgentMiddleware
and RetryMiddleware
, and scrapy-proxies used RetryMiddleware
. I am confused whether should I use the RetryMiddleware
in DOWNLOADER_MIDDLEWARES or not. Thanks in advance!
# Retry many times since proxies often fail
RETRY_TIMES = 10
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy_proxies.RandomProxy': 100,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
# Proxy list containing entries like
# http://host1:port
# http://username:password@host2:port
# http://host3:port
# ...
PROXY_LIST = '/path/to/proxy/list.txt'
# Proxy mode
# 0 = Every requests have different proxy
# 1 = Take only one proxy from the list and assign it to every requests
# 2 = Put a custom proxy to use in the settings
PROXY_MODE = 0
# If proxy mode is 2 uncomment this sentence :
#CUSTOM_PROXY = "http://host1:port"
@i-chaochen thank you for the kind words and the questions. Docs could definitely be better for this project, I agree.
Do I need to first pip install scrapy_proxies, and then I add RANDOM_UA_PER_PROXY = True in my setting.py? or it is already included and I can add RANDOM_UA_PER_PROXY = True directly.
Yeah, scrapy_proxies
is not listed in project requirements and you would need to install it separately.
Also, for scrapy_proxies priority, do I need to add another DOWNLOADER_MIDDLEWARE for scrapy-proxies? I mean there will be two DOWNLOADER_MIDDLEWARES, respectively. And I then just set two priorities of fake-useragent are larger than scrapy-proxies, so I can have proxy + fake user agent together?
It seems so. Though, I have not used this combination of scrapy-fake-useragent and scrapy-proxies myself. I'd say do some experimentation with the middlewares setup while logging proxies and headers.
Hope that helps.
@alecxe Thanks. After reading your code and a couple of tries I think I figured it out and tested it OK.
RANDOM_UA_PER_PROXY = True
scrapy.Request(meta={'proxy'} : `your_proxy_address`)
But just need to remember, if we set RANDOM_UA_PER_PROXY = True
, the UA would be fixed for each request and only random for each proxy address.
Does it assign a different UA to each request? Does it assign a different UA to each request retry?