Closed reyman closed 7 years ago
@reyman Hi! Good question!
The problem is, the middleware would not re-assign a user-agent if it is set explicitly, like in your example.
If you want to let the middleware set a random user agent, just don't set the User-Agent
header:
def start_requests(self):
cf_requests = []
for url in self.start_urls:
token, agent = cfscrape.get_tokens(url)
self.logger.info("agent = %s", agent)
cf_requests.append(scrapy.Request(url=url, cookies=token))
return cf_requests
Hope I'm understanding the problem correctly. Thanks.
@alecxe Thanks for your answer. I try like that, but User-Agent continue to differ, i think my question is not as clear as i think.
My problem is that cfscrape define a random user_agent
from a limited list directly writted in the code (see here ) if no user_agent
is defined when i run cfscrape.get_tokens(url)
.
So the only way i see is to get the random User_Agent
generated by your middleware and inject it into cfscrape.get_tokens()
. This method make a first call to url
to resolve the cloudfare anti bot measure, and after that return a cookie which authorize next url
requests.
But i suppose that is not possible to get User_Agent
(for example) ua
generated by your middleware before the start_requests(self)
run cfscape.get_tokens(url,ua)
?
@reyman gotcha. Well, not the most beautiful solution, but as a workaround, you can try generating the random User-Agent directly with:
from fake_useragent import UserAgent
ua = UserAgent()
user_agent = self.ua.random
Please let me know if this is good enough..thanks.
Yeah it's work like that, thanks :+1:
from fake_useragent import UserAgent
ua = UserAgent()
...
def start_requests(self):
cf_requests = []
user_agent = self.ua.random
self.logger.info("RANDOM user_agent = %s", user_agent)
for url in self.start_urls:
token , agent = cfscrape.get_tokens(url,user_agent)
self.logger.info("token = %s", token)
self.logger.info("agent = %s", agent)
cf_requests.append(scrapy.Request(url=url,
cookies= token,
headers={'User-Agent': agent}))
return cf_requests
Hi, This is more a question than an issue i suppose but perhaps you can help me. I'm trying to create a scraper using your extension with
cfscrape
,privoxy
, andscrapy_fake_useragent
I'm usingcfscrape
python extension to bypass cloudfare protection with scrapy.To collect cookie needed by
cfscrape
, i need to redefine thestart_request
function into my spider class, like this :My problem is that the
user_agent
collected bystart_requests
is not the same that theuser_agent
randomly selected byscrapy_fake_useragent
, as you can see :I defined my extension in this order :
I need the same
user_agent
, so how can i pass the good user agent generated byscrapy_fake_useragent
into thestart_requests
method ?