Closed Krylanc3lo closed 5 years ago
I think that it is releated to #212
i think that we have to add at the beginning of the node -e command
if (typeof atob === 'undefined') {
global.atob = function (b64Encoded) {
return new Buffer(b64Encoded, 'base64').toString('binary');
};
}
`$ node -e "global.Buffer = global.Buffer || require('buffer').Buffer;if (typeof atob === 'undefined') {global.atob = function (str) {return new Buffer(str, 'base64').toString('binary');};}console.log(atob('Hello'));"
ée `
Thank you Pawel, I will try it
You mean adding this code in base.py script ?
We have to edit this line js = "console.log(require('vm').runInNewContext('%s', Object.create(null), {timeout: 5000}));" % js I will create pull request when I back home
js2py has not been used for a long time. First of all, update your code please (and install node).
Thanks Lukas for the suggestion, using node & updating the code led to the same error but at least I am using the latest version:
ReferenceError: atob is not defined
at evalmachine.
I will try to implement Pawel's suggestion
It is more complicated than i thought before. But we can replace content between atob("ZG9jdW1l") and atob("aW5uZXJIVE1M") ('document.getElementById(k).innerHTML') with the text defined under html element with id defined by k variable
`(function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}(+((!+[]+!![]+!![]+[])))) `
And we got 'Cannot read property 'charCodeAt' of undefined' because we are not passing t variable to nodejs call
Ok. I finally got it. I will provide code tomorrow.
Great! looking forward to it.
Thanks again Pawel!
@pawliczka have you got any pseudo code that we can implement in the mean time for those with projects that relay on bypassing cf?
To solve the problem with undefined atob
you have to replace atob("ZG9jdW1l")....atob("aW5uZXJIVE1M")
for me:
atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")."+atob("aW5uZXJIVE1M")
with data under element defined by variable k (for me k = 'cf-dn-lZTYtMjTTnWU';
) and
<div style="display:none;visibility:hidden;" id="cf-dn-lZTYtMjTTnWU">+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((+!![]+[])+(+!![])+(+[])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))</div>
Also you have to solve the problem with TypeError: Cannot read property 'charCodeAt' of undefined
and a.value
you can do it this way:
js = js.replace('a.value','a')
js = js.replace("; 121",'')
js = "console.log(require('vm').runInNewContext('var a; var t = \"%s\";%s', Object.create(null), {timeout: 5000}));" % (domain, js)
Now it should work fine. I think that cf provide new challenge algorithm only for some part of users. Some domains still using old challenge algorithm as in #212
In which file do you find atob function ?
atob is part of node https://www.npmjs.com/package/atob, @Krylanc3lo personally i back port most of the changes and still use js2py and avoid node at all costs..
Thanks @VeNoMouS. Do you know what I have to update on js2py side ?
atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]
= document
+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")."
= .getElementById(k).
+atob("aW5uZXJIVE1M")
= innerHTML
document.getElementById(k).innerHTML
@Krylanc3lo looking into it myself
@VeNoMouS Could you please send me a diff or link for your fork when you are done? I'm going sleep now.
@pawliczka sweet as mate :)
lol this jsfuck
is really annoying when trying to work out what its attempting to do...
@VeNoMouS I just wrote some code to take the pain out of it. https://github.com/codemanki/cloudscraper/issues/170#issuecomment-478203909
@pro-src im just doing the same in python ;P nice job :)
@VeNoMouS Also here is a node based definition for atob.
function atob(str) {
return Buffer.from(str, 'base64').toString('binary');
}
ah thanks, i ended up just replacing it with regex base64'd till i got it all working
so.... my rewrite produced this... its a bit of a hack atm...
how ever it breaks under js2py .. , im trying to work that out.
File "/usr/local/lib/python2.7/dist-packages/js2py/base.py", line 1001, in callprop
'%s is not a function' % cand.typeof())
js2py.internals.simplex.JsException: TypeError: 'undefined' is not a function
Ok ... i found one of the root causes of the "undefined" but still got another issue i think...
("")["italics"]()
is str.italics()
... js2py doesn't know how to handle it..
import logging
import random
import re
from pprint import pprint
from base64 import b64decode
from copy import deepcopy
from time import sleep
#from lib import js2py
import js2py
from lib.requests.sessions import Session
try:
from urlparse import urlparse
except ImportError:
from urllib.parse import urlparse
__version__ = "1.9.5"
# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT
DEFAULT_USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"
]
DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)
BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""
ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""
class CloudflareScraper(Session):
def __init__(self, *args, **kwargs):
self.delay = kwargs.pop("delay", 8)
super(CloudflareScraper, self).__init__(*args, **kwargs)
if "requests" in self.headers["User-Agent"]:
# Set a random User-Agent if no custom User-Agent has been set
self.headers["User-Agent"] = DEFAULT_USER_AGENT
def is_cloudflare_challenge(self, resp):
return (
resp.status_code == 503
and resp.headers.get("Server", "").startswith("cloudflare")
and b"jschl_vc" in resp.content
and b"jschl_answer" in resp.content
)
def request(self, method, url, *args, **kwargs):
resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
# Check if Cloudflare anti-bot is on
if self.is_cloudflare_challenge(resp):
resp = self.solve_cf_challenge(resp, **kwargs)
return resp
def solve_cf_challenge(self, resp, **original_kwargs):
sleep(self.delay) # Cloudflare requires a delay before solving the challenge
body = resp.text
rq = re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>', body,re.MULTILINE | re.DOTALL)
body = re.sub(
r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
"{};".format(rq.group(2)),
body
)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = "%s://%s/cdn-cgi/l/chk_jschl" % (parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
params = cloudflare_kwargs.setdefault("params", {})
headers = cloudflare_kwargs.setdefault("headers", {})
headers["Referer"] = resp.url
try:
params["jschl_vc"] = re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)
params["pass"] = re.search(r'name="pass" value="(.+?)"', body).group(1)
params["s"] = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')
except Exception as e:
# Something is wrong with the page.
# This may indicate Cloudflare has changed their anti-bot
# technique. If you see this and are running the latest version,
# please open a GitHub issue so I can update the code accordingly.
raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))
# Solve the Javascript challenge
params["jschl_answer"] = self.solve_challenge(body, domain)
pprint(params)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
method = resp.request.method
cloudflare_kwargs["allow_redirects"] = False
redirect = self.request(method, submit_url, **cloudflare_kwargs)
pprint(redirect.content)
#exit()
redirect_location = urlparse(redirect.headers["Location"])
if not redirect_location.netloc:
redirect_url = "%s://%s%s" % (parsed_url.scheme, domain, redirect_location.path)
return self.request(method, redirect_url, **original_kwargs)
return self.request(method, redirect.headers["Location"], **original_kwargs)
def solve_challenge(self, body, domain):
try:
js = re.search(r"setTimeout\(function\(\){\s+(var "
"s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
except Exception:
raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)
js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))
js = js.replace('; 121', '')
js = js.replace('function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}', 't.charCodeAt')
# Strip characters that could be used to exit the string context
# These characters are not currently used in Cloudflare's arithmetic snippet
js = re.sub(r"[\n\\']", "", js)
if "toFixed" not in js:
raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)
try:
js = "a = {}; t = \"" + domain + "\";" + js
result = js2py.eval_js(js)
except Exception:
logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
raise
try:
float(result)
except Exception:
raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)
return result
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudflareScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
## Functions for integrating cloudflare-scrape with other applications and scripts
@classmethod
def get_tokens(cls, url, user_agent=None, **kwargs):
scraper = cls.create_scraper()
if user_agent:
scraper.headers["User-Agent"] = user_agent
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception as e:
logging.error("'%s' returned an error. Could not collect tokens." % url)
raise
domain = urlparse(resp.url).netloc
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith(".") and d in ("." + domain):
cookie_domain = d
break
else:
raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")
return ({
"__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
"cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
},
scraper.headers["User-Agent"]
)
@classmethod
def get_cookie_string(cls, url, user_agent=None, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
return "; ".join("=".join(pair) for pair in tokens.items()), user_agent
create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string
I dunno... it seems like it works... but something is wrong...
I'm using #206 impl and I'm constantly getting KeyError: 'location'
. With your impl the same :( Something else has changed I think
We are getting 403 code:
@pawliczka yea confirmed, 403 for me as well.
I also noticed my s param, is always longer ie
3a6002246ad63c4993313cb0399bdbb8d0e9b45b-1553942508-1800-AbDVH86ld1XqRlLVE9OWGYQOVasTx6qOsfFLmhzyZnkx+QSWtR/E4MrwizZGjZW9QnofW4wm0DzHcJVZQh1U/ZRaq35yTt/2nkpRKwwbgo5erVnZ9xN+JWP4QLj7SKG76S2TQ3GMNP0x27IOkvOCiYQ=
than say burp..
3273ae847b6f60cb064e7e226833bb895e0a27aa-1553940805-1800-AVa7UnzBT0LN9tEsGhdyuYaJkEn1iQQAXJQeBZ3rcvm2gL8EUBuPGRkFbwGBIwdhhrs4ngAeCZPnudHrEUagggBAdI9BDvLXMme9lksX1Q3DcTkkPneTDg554HRjJ3cbvQ==
I think that this is something related to the '+' at the end of s. But i do not know that the +
means :((
+JWP4QLj7SKG76S2TQ3GMNP0x27IOkvOCiYQ=
ignore that long vs short and +
just did a fresh burp and got
12abc9621d577b480398b15fe6984c47533e560d-1553942936-1800-Afn+s585Y44v3vwSCBFlWbiLQIUPcCuY/JWwuUrdbWXMh8FtkF38FMVRcQ6fjM3xBxt6TWb0ap+nWvU1AhAQiTMZuutumksS6ScSEChw2xJo8x9efxR5jxeQH6KcY0anXA==
We are lost😒
@pawliczka i dumped out my burp response into a file and injected it into the cfscrape... this is my burp
this is my params payload in cfscrape
{'jschl_answer': '-10.8784096873',
'jschl_vc': '0dd277f3ee7d26fd9fc79497b5d6a8d7',
'pass': '1553942940.979-Xdc4gdP8aN',
's': '12abc9621d577b480398b15fe6984c47533e560d-1553942936-1800-Afn+s585Y44v3vwSCBFlWbiLQIUPcCuY/JWwuUrdbWXMh8FtkF38FMVRcQ6fjM3xBxt6TWb0ap+nWvU1AhAQiTMZuutumksS6ScSEChw2xJo8x9efxR5jxeQH6KcY0anXA=='}```
it's identical as far as i can see..
@VeNoMouS ok I see. But you still have 403?
ok @pawliczka got it.. the parameters have to be in specific order...
ry:
#params["jschl_vc"] = re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)
#params["pass"] = re.search(r'name="pass" value="(.+?)"', body).group(1)
#params["s"] = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')
submit_url = '{}?s={}&jschl_vc={}&pass={}&jschl_answer={}'.format(
submit_url,
re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value'),
re.search(r'name="jschl_vc" value="(\w+)"', body).group(1),
re.search(r'name="pass" value="(.+?)"', body).group(1),
self.solve_challenge(body, domain)
)
resulted in
{'Content-Length': '159', 'Server': 'cloudflare', 'Connection': 'keep-alive', 'Location': 'https://ww5.justdubs.me/css/style.css', 'Date': 'Sat, 30 Mar 2019 11:30:08 GMT', 'CF-RAY': '4bf9bff27ce5a41d-AKL', 'Content-Type': 'text/html', 'X-Frame-Options': 'SAMEORIGIN'}
OMG! Good job
I dunno something is broken for me still ... i been working on this too long its 1am, it looks like it auth's gives me a location, but keeps looping between 302 and 503... if you dont resolve it tonight ill try pick it up back up tomorrow.. but ive had enough for tonight.
I replaced params with OrderedDictionary. And now i got instant recapcha 😆
I did try that, but when i posted, it didnt look in order... shrug, least its going for everyone :)
Thanks for helping. By using the code shared by @VeNoMouS and the params part, I get a 302 error as well.
@pawliczka, the OrderedDictionary part fixes everything ?
@Krylanc3lo nothing. Always captcha
OK thanks
@VeNoMouS
String.prototype.italics = function () {
return '<i>' + this + '</i>'
};
var empty = "";
console.log(empty.italics(), "xyz".italics()); // "<i></i> <i>xyz</i>"
@pawliczka
I think that this is something related to the '+' at the end of s. But i do not know that the
+
means :((+JWP4QLj7SKG76S2TQ3GMNP0x27IOkvOCiYQ=
Base64 as the name suggests is a base 64 number system that utilizes 64 digits. The "+" is the second to last digit. If we're treating the chars strictly as digits then the decimal(base 10) representation of "+" is 63. The "=" is used for padding. See: https://stackoverflow.com/questions/6916805/why-does-a-base64-encoded-string-have-an-sign-at-the-end
Digits: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
I’m just on a road trip with my gf today, but I will pick this back up later tonight and investigate further past where I got up to last night :)
even after same query strings, headers it just don't give clearance, just recapcha if we pass same user agent or referrer. ending with 302 or 403 always.
Hello,
I got the below error since a couple of days: js2py.internals.simplex.JsException: ReferenceError: atob is not defined
File "/home/maxx/.local/lib/python3.6/site-packages/js2py/base.py", line 1074, in get return self.prototype.get(prop, throw) File "/home/maxx/.local/lib/python3.6/site-packages/js2py/base.py", line 1079, in get raise MakeError('ReferenceError', '%s is not defined' % prop) js2py.internals.simplex.JsException: ReferenceError: atob is not defined
Anyone is experiencing the same ?
Thank you!