Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.39k stars 462 forks source link

site not support #86

Closed shaanlan closed 7 years ago

shaanlan commented 7 years ago

i use cfscrape to crawl vetogate.com web page, but failed.

code like this: url = 'http://www.rassd.com/cat-38.htm' scraper = cfscrape.create_scraper() content = scraper.get(url).content cfscrape will loop infinite in the 3rd line.

i follow the trace and found: 1st access 'http://www.rassd.com/cat-38.htm', and HTTP response code is 503. cfscrape sleep 5s and execute solve_cf_challenge() successfully (get params done) 2nd access 'http://www.rassd.com/cdn-cgi/l/chk_jschl' with the params, and HTTP response code is 302, when cfscrape redict the website return another 503, and there is a loop infinite.

i use fiddler to analysis the HTTP communication between web browser and website, and found there are 2 cookies duration the communication, they are from 1st and 2nd HTTP response. but when run cfscrape, the 2nd HTTP response does not has cookie. i donnot know whatis wrong with it.

could you give some help? thanks

Anorov commented 7 years ago

Sorry about that. Can you do a git pull or pip install --upgrade cfscrape and try again?

shaanlan commented 7 years ago

thanks for your reply, i upgrade cfscrape and run again. the crawl result html page show Error 523, Origin is unreachable and suggest Please try again in a few minutes. i try several times but fail all the time.

shaanlan commented 7 years ago

sorry for my mistake, it's due to environment. when running on my laptop , cfscrape result is ok. i also try it on several remote machines in different location, some runs successfully, some failed, and the execute time is long (several minutes per page). anyway, cfscrape get what i need. thanks a lot~