Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.39k stars 461 forks source link

cloudflare-scrape Doesn't Work with Scrapy #36

Closed raresample closed 8 years ago

raresample commented 8 years ago

I was practicing using Scrapy on a site that apparently just today implemented Cloudflare protection. After a bit of research, I tried cloudflare-scrape.

(http://www.endclothing.com/us/latest-products/latest-sneakers) Full html: http://pastebin.com/CxgH9NzB

I'm using Python 2.7, Requests 2.8.1, PyExecJS (not sure which version), Node.js 0.10.25.

Along with adding the line "import cfscrape", I overwrote Scrapy's start_requests method to use cfscrape.get_tokens() as described in this post: http://stackoverflow.com/a/33290671

Here is my full spider.py file: http://pastebin.com/mHDNw69G

The output is fairly limited. 0 items scraped (I expected 120). No errors in the log, just 503 status on the start_url. Here's the full log: http://pastebin.com/1HLijt5Z

Anorov commented 8 years ago

You're missing the cf_clearance cookie.

Try replacing

cookies={'__cfduid': token['__cfduid']},

with

cookies=token

(The variable should probably be named tokens.)

raheelkhan commented 6 years ago

For me it still doesn't work.. I have the following flow

1 - Get 503 from cfscrape module 2 - Get the cookies and 302 3 - Get 200

Then the scrapy request is yeiled attached the tokens in cookies.

All scrapy requests gave 503