Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.37k stars 458 forks source link

Unable to find the challenge on hypixel.net #334

Closed unixfox closed 4 years ago

unixfox commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.8.1

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/lib/python3.8/site-packages
Requires: requests
Required-by:

Code snippet involved with the issue

import cfscrape

scraper = cfscrape.create_scraper(delay=10)  # returns a CloudflareScraper instance
# Or: scraper = cfscrape.CloudflareScraper()  # CloudflareScraper inherits from requests.Session
print (scraper.get("https://hypixel.net").content)  # => "<!DOCTYPE html><html><head>..."

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/cfscrape/__init__.py", line 251, in solve_challenge
    challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print (scraper.get("https://hypixel.net/").content)  # => "<!DOCTYPE html><html><head>..."
  File "/usr/lib/python3.8/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3.8/site-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/usr/lib/python3.8/site-packages/cfscrape/__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/usr/lib/python3.8/site-packages/cfscrape/__init__.py", line 290, in solve_challenge
    raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

https://hypixel.net/

URL of Pastebin/Gist with HTML source of protected page

https://paste.ee/p/TiNCJ

lord8266 commented 4 years ago

So there seems to be a script right before the one required by the challenge

<script type="text/javascript">
//<![CDATA[
!function(){var t=function(){try{return!!window.addEventListener}catch(t){return!1}},n=function(n,e){t()?document.addEventListener("DOMContentLoaded",n,e):document.attachEvent("onreadystatechange",n)};n(function(){document.getElementById("h-content").style.display="block"},!1)}();
//]]>
</script>

// ---- The one required below
<script type="text/javascript">
  //<![CDATA[
  (function(){
    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
    b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
    b(function(){
      var a = document.getElementById('cf-content');a.style.display = 'block';
...
</script>

(this might be a update to cloudflare challenge page, but i dont see it on kissanime.ru) changing init.py L249 from

javascript = re.search(r'\<script type\=\"text\/javascript\"\>\n(.*?)\<\/script\>',body, flags=re.S).group(1) # find javascript

to

all_scripts = re.findall(r'\<script type\=\"text\/javascript\"\>\n(.*?)\<\/script\>',body, flags=re.S) # find javascript
javascript = next(filter(lambda w: "jschl-answer" in w,all_scripts))

finds the one having something which would most probably be in the Githubissues.

  • Githubissues is a development platform for aggregating issues.