Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.38k stars 460 forks source link

Error executing Cloudflare IUAM Javascript in Jupyter Notebook #219

Closed wchenaf closed 5 years ago

wchenaf commented 5 years ago

The isuue occurs when im scraping from http://javlibrary.com/cn/ It used to work fine

wchenaf commented 5 years ago

ReferenceError: atob is not defined at evalmachine.:1:486 at evalmachine.:1:785 at Script.runInContext (vm.js:107:20) at Script.runInNewContext (vm.js:113:17) at Object.runInNewContext (vm.js:296:38) at [eval]:1:27 at Script.runInThisContext (vm.js:96:20) at Object.runInThisContext (vm.js:303:38) at Object. ([eval]-wrapper:6:22) at Module._compile (internal/modules/cjs/loader.js:688:30) evalmachine.:1 var s,t,o,p,b,r,e,a,k,i,n,g,f, jcUgMaF={"JBRto":+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(+[])+(!+[]+!![]))}; ;jcUgMaF.JBRto-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+[])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));jcUgMaF.JBRto*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!

wchenaf commented 5 years ago

Command '['node', '-e', 'console.log(require(\'vm\').runInNewContext(\'var s,t,o,p,b,r,e,a,k,i,n,g,f, fdIskPp={"XZwKXLj":+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))}; ;fdIskPp.XZwKXLj+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]));fdIskPp.XZwKXLj-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![])+(+!![])+(+!![]));fdIskPp.XZwKXLj-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));fdIskPp.XZwKXLj=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));fdIskPp.XZwKXLj=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/(+(+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+[])))+(function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))"to"+String["name"][1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}(+((+!![]+[])))));fdIskPp.XZwKXLj+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+!![]+[])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]));fdIskPp.XZwKXLj-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]))/+((+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![]));fdIskPp.XZwKXLj*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]));fdIskPp.XZwKXLj+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));fdIskPp.XZwKXLj-=function(p){var p = eval(eval(atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))"to"+String["name"]+")."+atob("aW5uZXJIVE1M"))); return +(p)}();fdIskPp.XZwKXLj+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));a.value = (+fdIskPp.XZwKXLj).toFixed(10); ; 121\', Object.create(null), {timeout: 5000}));']' returned non-zero exit status 1.

lukele commented 5 years ago

Try this pull request: https://github.com/Anorov/cloudflare-scrape/pull/206

EdmundMartin commented 5 years ago

@lukele Any reason why this pull request hasn't bee accepted? Would be nice to be able install via pip from pypi during build process or has the maintainer stopped actively accepting pull requests.

This atob issue appears only occur on certain sites. So that certainly lines up with Lukele has put in his pull request.

lukele commented 5 years ago

@EdmundMartin I think @Anorov just hasn't had a chance to look at it yet. Also Cloudflare seems to change their code every few days. Hard to keep up...

wchenaf commented 5 years ago

@EdmundMartin I think @Anorov just hasn't had a chance to look at it yet. Also Cloudflare seems to change their code every few days. Hard to keep up...

Try this pull request: #206

Do I just download and replace that init file? It still doesn't work.

lukele commented 5 years ago

@wchenaf yes, that should work. What error are you seeing?

Aniz74 commented 5 years ago

@wchenaf yes, that should work. What error are you seeing?

Hello lukele. First, good work. Your fist update was working. But after CF update again, not work anymore. This is the error: https://i.imgur.com/sw2kvYk.png Your last update is 12 days ago but CF updated 31 march.

lukele commented 5 years ago

If you can share the URL you are trying this on, I‘ll have a look

Aniz74 commented 5 years ago

Try with this http://www.javlibrary.com/en/search.php

ghost commented 5 years ago

Actually, Cloudflare deployed an update just yesterday. https://github.com/codemanki/cloudscraper/issues/179

Aniz74 commented 5 years ago

Good, but i prefer the classic version, work better on my side. I still waiting for the fix of lukele, last fix from him was very good

VeNoMouS commented 5 years ago

@Aniz74 try mine then it uses the classic js2py before @Anorov changed over to node..

https://github.com/VeNoMouS/cloudflare-scrape-js2py

Aniz74 commented 5 years ago

There are a lots of sites that your version not work

VeNoMouS commented 5 years ago

@Aniz74 can you give some examples please?

Aniz74 commented 5 years ago

Yes, this is the error that show in 90% of the site that i have tested :

https://i.imgur.com/iPkRYmL.png

EdmundMartin commented 5 years ago

@VeNoMouS I have tried your version of the library and it appears to fail on the same set of sites as the core implementation does. It the location is not contained within the headers and thus parsing the redirect fails.

Can message you some example sites if you are interested.

Update: For sites I was having issues with it appears that multiple rounds of challenge solving are required in order to get a 200 status code. I have been able to make VeNoMous fork work with these sites with a couple of changes.

VeNoMouS commented 5 years ago

@EdmundMartin yes please, id like to resolve any issues if some are occurring... problem is no one is posting example urls...

VeNoMouS commented 5 years ago

@pro-src tagging you in , you may be interested.

lukele commented 5 years ago

I'm currently seeing captcha requests on those sites, but don't understand yet what has changed, since the challenge is solved correctly, the headers cfscrape and the browser are sending are the same.

lukele commented 5 years ago

@EdmundMartin Does http://www.javlibrary.com/en/search.php work for you? I wonder if we are also seeing different results based on the IP

VeNoMouS commented 5 years ago
#!/usr/bin/python

from lib import cfscrape
from lib import requests
from pprint import pprint
import os
import sys
import re
from base64 import b64decode
class Test():
    def __init__(self):
            self.site_url = "http://javlibrary.com/cn/"
            self.session = requests.session()
            self.funcName = lambda n=0: sys._getframe(n + 1).f_code.co_name + "()"

    def _cloudFlare(self, response):
            cf = cfscrape.create_scraper(sess=self.session)
            cf.set_cloudflare_challenge_delay(0)

            if cf.is_cloudflare_challenge(response):
                print("{} requested URL - {}, encounted CloudFlare DDOS Protection.. Bypassing.".format(self.funcName(), response.url))

                response = cf.get(self.site_url, timeout=30)

                if not cf.is_cloudflare_challenge(response):
                    return (True, True)

                return (True, False)

            return (False, True)

    def test(self):
        ret = self.session.get(self.site_url, timeout=30)
        if (True, True) == self._cloudFlare(ret):
            print("{} CloudFlare DDOS Protection.. Bypassed successfully.".format(self.funcName()))
            ret = self.session.get(self.site_url, timeout=30)
            print(ret.content)

Test().test()
_cloudFlare() requested URL - http://javlibrary.com/cn/, encounted CloudFlare DDOS Protection.. Bypassing.
test() CloudFlare DDOS Protection.. Bypassed successfully.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="zh-cn" dir="ltr" xmlns:og="http://ogp.me/ns#">
<head>
<title>欢迎光临JavLibrary,你的线上日本成人影片情报站。 - JAVLibrary</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="shortcut icon" href="/favicon.ico" />
<link href="../js/main.css?1553017669" type="text/css" rel="stylesheet" />
<script src="../js/jquerylibrary.min.js?1553017669" type="text/javascript"></script>
<script src="../js/main.min.js?1553017669" type="text/javascript"></script>
<meta name="Description" content="你的线上日本成人影片情报站。管理你的影片并分享你的想法。" />
<!-- ********* Start of Custom Header *********  -->
<meta property="og:url" content="http://www.javlibrary.com/cn/">
<meta property="og:site_name" content="JavLibrary.com">
<meta property="og:title" content="欢迎光临JavLibrary,你的线上日本成人影片情报站。 - JAVLibrary" />
<meta property="og:type" content="video.movie" />
<meta property="og:description" content="你的线上日本成人影片情报站。管理你的影片并分享你的想法。" />
<link rel='shortlink' href="http://www.javlibrary.com/cn/"/>
<link rel="canonical" href="http://www.javlibrary.com/cn/"/>
<link rel="image_src" href="http://www.javlibrary.com/img/logo-icon.png">
<meta property="og:image" content="http://www.javlibrary.com/img/logo-icon.png" />
<link rel="alternate" href="rss.php" title="RSS feed" type="application/rss+xml" />
<script src="../js/videocensored.lang.js.php?hl=cn&1553017669" type="text/javascript" charset="UTF-8"></script>
<script type="text/javascript" src="../js/vl_addfav.min.js?1553017669"></script>
<script type="text/javascript">
var $usingAjax = false;
</script>
<!-- ********* End of Custom Header *********  -->
</head>
<body class="main">
<!-- ********* Start of Top Menu *********  -->
<div id="topmenu">
        <div class="searchbar">
                <form name="searchbar" method="get" action="vl_searchbyid.php">
                <table>
                <tr>
                        <td>
                                <input type="text" name="keyword" id="idsearchbox" value="" />
                                <div id="idsearchboxmask" >&lt;例: TT-013&gt;</div>
                        </td>
                        <td><input type="button" value="识别码搜寻" id="idsearchbutton" /></td>
                        <td class="advsearch">&nbsp;&nbsp;[<a href="search.php">进阶搜寻</a>]</td>
                        <td class="socialmedia">
                                <a href="https://twitter.com/share" class="twitter-share-button media" data-url="http://www.javlibrary.com" data-text="JAVLibary">Tweet</a>
                                <div class="fb-like media" data-href="http://www.javlibrary.com" data-layout="button_count" data-width="100" data-show-faces="false" data-font="arial" style="margin:0px 10px 0px 0px; padding-bottom: 5px;"></div>
                                <div id="fb-root"></div>
                        </td>
                </tr>
                </table>
                </form>
        </div class="searchbar">
        <div class="menutext">
                <script type="text/javascript">
                <!--
                if (getCookie("session") == null || getCookie("userid") == null ) {
                        document.write('<a href="login.php">登入</a> | <a href="register.php">登录</a>');
                } else {
                        if (getCookie("hasmsg") == null) {
                                document.write('<a href="myaccount.php">我的帐号</a> | <a href="pm_inbox.php">收件夹</a> | <a href="logout.php">登出</a>');
                        } else {
                                document.write('<a href="myaccount.php">我的帐号</a> | <img src="../img/img_star.gif" width="25" height="25" border="0" style="vertical-align: middle;"><a href="pm_inbox.php">收件夹</a> | <a href="logout.php">登出</a>');
                        }
                }
                //-->
                </script>
        </div class="menutext">
</div id="topmenu">
<div id="toplogo">
        <div class="sitelogo">
                <a href="./" rel='bookmark'><img src="../img/logo-top.png" width="350" height="50" border="0" title="Jav Library.com - Japanese Adult Video Library"></a>
        </div class="sitelogo">
        <div class="topbanner1" id="topbanner11" style="position:absolute; left:380px; top:5px; width:728px; height:92px; overflow:hidden;">
        <script type='text/javascript' src='../js/bnr_fun88_1.js?1553017669'></script>  </div class="topbanner">
                <div class="languagemenu">
        Language:
        <select onChange="if (document.location.href.toLowerCase().indexOf('u29k.com') >= 0) {  document.location.replace(document.location.href.replace(/&page=[0-9]+/i, '').replace('u29k.com/cn/', 'u29k.com/'+this.options[selectedIndex].value+'/'));} else if (document.location.href.toLowerCase().indexOf('d28k.com') >= 0) {   document.location.replace(document.location.href.replace(/&page=[0-9]+/i, '').replace('d28k.com/cn/', 'd28k.com/'+this.options[selectedIndex].value+'/'));} else if (document.location.href.toLowerCase().indexOf('v27f.com') >= 0) {   document.location.replace(document.location.href.replace(/&page=[0-9]+/i, '').replace('v27f.com/cn/', 'v27f.com/'+this.options[selectedIndex].value+'/'));} else {document.location.replace(document.location.href.replace(/&page=[0-9]+/i, '').replace('javlibrary.com/cn/', 'javlibrary.com/'+this.options[selectedIndex].value+'/'));}">
                <option value="en">English</option>

etc...

works fine for me...

VeNoMouS commented 5 years ago

did you delete the old install cfscrape before installing mine?!?

EdmundMartin commented 5 years ago

@lukele The JavaLibrary example works for me with @VeNoMouS code. Potentially, it could be IP. Site owners can setup the Google ReCaptcha pages for certain IPs/ASNs passed on my understanding. So could be that your IP or ASN is blocked.

lukele commented 5 years ago

@VeNoMouS will try yours now. Mine is still based on Anorov's code with node js. If I see the captcha with your version as well, then my IP triggers the captcha.

ghost commented 5 years ago

:thinking: If you're getting a challenge, you can bypass that challenge. If you're getting a captcha after a challenge answer has been submitted, you failed to bypass that challenge.

@lukele Just visit the page in your web browser...

lukele commented 5 years ago

@pro-src I'm solving the challenge in browser with javascript turned off and see the same result as when solving it using cfscrape, so that's not it. Interestingly enough though, if I use @VeNoMouS version directly it doesn't work. If I use his test code however, it does. Trying to figure out why...

ghost commented 5 years ago

You should save the request headers, response headers, and response body into a gist so we can collaborate properly. There's a bit too much guessing going on here.

https://stackoverflow.com/questions/10588644/how-can-i-see-the-entire-http-request-thats-being-sent-by-my-python-application

VeNoMouS commented 5 years ago

nah fuck that @pro-src here use this

from requests_toolbelt.utils import dump

then after your requests call... populating the variable ret for example

data = dump.dump_all(ret)
print(data.decode('utf-8'))
VeNoMouS commented 5 years ago

You might not want to decode('utf-8') since your using Japanese and Chinese

but if i replace print(ret.content) in my test function with

data = dump.dump_all(ret)
print(data)

the output is..

_cloudFlare() requested URL - http://javlibrary.com/cn/, encounted CloudFlare DDOS Protection.. Bypassing.
test() CloudFlare DDOS Protection.. Bypassed successfully.
< GET /cn/ HTTP/1.1
< Host: javlibrary.com
< Connection: keep-alive
< Accept-Encoding: gzip, deflate
< Accept: */*
< User-Agent: python-requests/2.10.0
< Accept-Language: en-US,en;q=0.9
< DNT: 1
< Cookie: cf_clearance=5af76662ea0d2f336e32232c5a8339be6ad05310-1554372370-3600-150; __cfduid=d4f355a02cfd8ce8667b53550e8148aab1554372365
<

> HTTP/1.1 301 Moved Permanently
> Date: Thu, 04 Apr 2019 10:06:11 GMT
> Content-Type: text/html; charset=iso-8859-1
> Content-Length: 237
> Connection: keep-alive
> Location: http://www.javlibrary.com/cn/
> Server: cloudflare
> CF-RAY: 4c2277d90b0489e0-AKL
>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.javlibrary.com/cn/">here</a>.</p>
</body></html>
< GET /cn/ HTTP/1.1
< Host: www.javlibrary.com
< Connection: keep-alive
< Accept-Encoding: gzip, deflate
< Accept: */*
< User-Agent: python-requests/2.10.0
< Accept-Language: en-US,en;q=0.9
< DNT: 1
< Cookie: cf_clearance=5af76662ea0d2f336e32232c5a8339be6ad05310-1554372370-3600-150; __cfduid=d4f355a02cfd8ce8667b53550e8148aab1554372365
<

> HTTP/1.1 200 OK
> Date: Thu, 04 Apr 2019 10:06:12 GMT
> Content-Type: text/html; charset=UTF-8
> Transfer-Encoding: chunked
> Connection: keep-alive
> X-Powered-By: PHP/5.5.20
> Last-Modified: Thu, 04 Apr 2019 00:00:00 GMT
> Expires: Fri, 05 Apr 2019 00:00:00 GMT
> ETag: W/"bab70eb990dba01f1696313e2bbb52e6"
> Server: cloudflare
> CF-RAY: 4c2277dc6ef5a41d-AKL
> Content-Encoding: gzip
>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="zh-cn" dir="ltr" xmlns:og="http://ogp.me/ns#">
<head>
<title>欢迎光临JavLibrary,你的线上日本成人影片情报站。 - JAVLibrary</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="shortcut icon" href="/favicon.ico" />

You can dump anywhere in the library...

best place would be

  def request(self, method, url, *args, **kwargs):
        self.headers['Accept-Encoding'] = 'gzip, deflate'
        self.headers['Accept-Language'] = 'en-US,en;q=0.9'
        self.headers['DNT'] = '1'

        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)

and dumping resp.

ghost commented 5 years ago

I spy a bug.

VeNoMouS commented 5 years ago

OoOH where abouts?

ghost commented 5 years ago

Whats with the UA?

VeNoMouS commented 5 years ago

That's from test()... because its already passed the challenge..

VeNoMouS commented 5 years ago

we talked about UA being the same only in the challenge last night... when im dumping in the above example, the challenge was already solved and delivered.

So... that point we only need the __cfduid cookie... UA does not need to match challenge at that point

VeNoMouS commented 5 years ago

I'm sharing my session remember here cf = cfscrape.create_scraper(sess=self.session) so it populates the cookies once it's solved, then i can go back to using normal requests calls... as long as i am using the shared session.

And since we only set the UA on the challenge, when we make normal requests calls... it uses default headers again.. hence the default "requests" UA

Initial Call (default UA) ->Get 503 -> Request with modified UA -> Send Solved Challenge with modified UA -> business as usual default UA because we have cookie now...

VeNoMouS commented 5 years ago

@pro-src email so I can send you a invite to do instant msg

Aniz74 commented 5 years ago

VeNoMous, so you confirm that javlibrary you have solved with your code? OR you need insert another update?

VeNoMouS commented 5 years ago

@Aniz74 no mine works 100% of the time as far as im aware...

lukele commented 5 years ago

@VeNoMouS On my computer it is hit and miss with yours as well unfortunately if I'm not creating a session first and call the URL with that session. It is however quite reliable using the session. And turns out so was my version all along.

VeNoMouS commented 5 years ago

Should always use a session my friend.... other wise your creating massive delays doing the challenge all the time.... you only need to solve it once with a shared session :)

lukele commented 5 years ago

@VeNoMous not necessarily. cf_clearance cookie is stored and send along all future requests. At least that's what I'm seeing.

Aniz74 commented 5 years ago

Lukele, can you update also your version? So i can test. Thanks..

wchenaf commented 5 years ago

Seems you guys have conquered the challenge. So...how may I get it to work?

Aniz74 commented 5 years ago

Still hoping for lukele version .The only that with my sites was working

VeNoMouS commented 5 years ago

@wchenaf you could grab a copy of my latest code... https://github.com/VeNoMouS/cloudflare-scrape-js2py just delete the original install of cfscrape and install mine ..

wchenaf commented 5 years ago

@wchenaf you could grab a copy of my latest code... https://github.com/VeNoMouS/cloudflare-scrape-js2py just delete the original install of cfscrape and install mine ..

urllib3.connectionpool:Connection pool is full, discarding connection: www.javlibrary.com

wchenaf commented 5 years ago

@wchenaf you could grab a copy of my latest code... https://github.com/VeNoMouS/cloudflare-scrape-js2py just delete the original install of cfscrape and install mine ..

ran again and it is working!

bjgood commented 5 years ago

@VeNoMouS or @pro-src What about people where js2py is not available? Like on a raspberry pi 3 with OS LibreELEC (kodi).

Will another version be made available without that dependency?

Thank you.

lukele commented 5 years ago

@bjgood if you can run nodejs, you could continue using this version with my modifications: https://github.com/lukele/cloudflare-scrape/tree/update-challenge-solver