Closed raresample closed 8 years ago
You're missing the cf_clearance
cookie.
Try replacing
cookies={'__cfduid': token['__cfduid']},
with
cookies=token
(The variable should probably be named tokens
.)
For me it still doesn't work.. I have the following flow
1 - Get 503 from cfscrape module 2 - Get the cookies and 302 3 - Get 200
Then the scrapy request is yeiled attached the tokens in cookies.
All scrapy requests gave 503
I was practicing using Scrapy on a site that apparently just today implemented Cloudflare protection. After a bit of research, I tried cloudflare-scrape.
(http://www.endclothing.com/us/latest-products/latest-sneakers) Full html: http://pastebin.com/CxgH9NzB
I'm using Python 2.7, Requests 2.8.1, PyExecJS (not sure which version), Node.js 0.10.25.
Along with adding the line "import cfscrape", I overwrote Scrapy's start_requests method to use cfscrape.get_tokens() as described in this post: http://stackoverflow.com/a/33290671
Here is my full spider.py file: http://pastebin.com/mHDNw69G
The output is fairly limited. 0 items scraped (I expected 120). No errors in the log, just 503 status on the start_url. Here's the full log: http://pastebin.com/1HLijt5Z