Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.35k stars 458 forks source link

ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script. #416

Open gabrielepinto opened 3 years ago

gabrielepinto commented 3 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.8.3

cfscrape version number

Run pip show cfscrape and paste the output below:


Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: d:\anaconda\lib\site-packages
Requires: requests
Required-by: 

Code snippet involved with the issue

url="https://ftp.partitodemocratico.it/elezioni_trasparenti/index.html"
session = re.Session()
scraper = cfscrape.create_scraper(sess=session)
m=scraper.get(url)
m

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
D:\anaconda\lib\site-packages\cfscrape\__init__.py in solve_challenge(self, body, domain)
    250 
--> 251             challenge, ms = re.search(
    252                 r"setTimeout\(function\(\){\s*(var "

AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-66-3d78a2930e19> in <module>
      1 session = re.Session()
      2 scraper = cfscrape.create_scraper(sess=session)
----> 3 m=scraper.get(url)
      4 m

D:\anaconda\lib\site-packages\requests\sessions.py in get(self, url, **kwargs)
    553 
    554         kwargs.setdefault('allow_redirects', True)
--> 555         return self.request('GET', url, **kwargs)
    556 
    557     def options(self, url, **kwargs):

D:\anaconda\lib\site-packages\cfscrape\__init__.py in request(self, method, url, *args, **kwargs)
    127         # Check if Cloudflare anti-bot "I'm Under Attack Mode" is enabled
    128         if self.is_cloudflare_iuam_challenge(resp):
--> 129             resp = self.solve_cf_challenge(resp, **kwargs)
    130 
    131         return resp

D:\anaconda\lib\site-packages\cfscrape\__init__.py in solve_cf_challenge(self, resp, **original_kwargs)
    202 
    203         # Solve the Javascript challenge
--> 204         answer, delay = self.solve_challenge(body, domain)
    205         if method == 'POST':
    206             cloudflare_kwargs["data"]["jschl_answer"] = answer

D:\anaconda\lib\site-packages\cfscrape\__init__.py in solve_challenge(self, body, domain)
    288             delay = self.delay or (float(ms) / float(1000) if ms else 8)
    289         except Exception:
--> 290             raise ValueError(
    291                 "Unable to identify Cloudflare IUAM Javascript on website. %s"
    292                 % BUG_REPORT

ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

[https://ftp.partitodemocratico.it/elezioni_trasparenti/index.html]

URL of Pastebin/Gist with HTML source of protected page

[<!DOCTYPE HTML>

Elezioni trasparenti - Partito Democratico

Elezioni trasparenti

Elezioni -
Candidato

Curriculum Vitae - Certificato penale

Cerca:

]
rusq commented 3 years ago

Trying to get the RBNZ rates excel file with cloud scraper also results in this error message:

import cfscrape

URL="https://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/tables/b1/hb1-daily.xlsx"

scraper = cfscrape.create_scraper()

with open("dump.bin", "wb") as f:
    f.write(scraper.get(URL).content)
(.venv) [1:wf/personal/rbcf> pip show cfscrape
Name: cfscrape
Version: 2.1.1

(.venv) [0:wf/personal/rbcf> python rbnz.py 
Traceback (most recent call last):
  File "/Users/x/wf/personal/rbcf/.venv/lib/python3.8/site-packages/cfscrape/__init__.py", line 251, in solve_challenge
    challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rbnz.py", line 8, in <module>
    f.write(scraper.get(URL).content)
  File "/Users/x/wf/personal/rbcf/.venv/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/Users/x/wf/personal/rbcf/.venv/lib/python3.8/site-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/Users/x/wf/personal/rbcf/.venv/lib/python3.8/site-packages/cfscrape/__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/Users/x/wf/personal/rbcf/.venv/lib/python3.8/site-packages/cfscrape/__init__.py", line 290, in solve_challenge
    raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.
sabbathwd commented 3 years ago

Trying to parse https://hidemy.name/ for proxies with cfscrape and getting an exception

Versions:

Python 3.9.2
node v15.12.0
cfscrape 2.1.1
import cfscrape

scraper = cfscrape.create_scraper()
headers = {
"user-agent": UserAgent().random
}

def get_pages_count():
url = 'https://hidemy.name/en/proxy-list/#list'
r = scraper.get(url, headers=headers).text

Exception

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/cfscrape/__init__.py", line 251, in solve_challenge
    challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pavelvolkov/ma_work/ma_services/proxy_collector/scripts/hidemyname.py", line 117, in <module>
    parse(0, {})
  File "/Users/pavelvolkov/ma_work/ma_services/proxy_collector/scripts/hidemyname.py", line 97, in parse
    links = get_pages_count()
  File "/Users/pavelvolkov/ma_work/ma_services/proxy_collector/scripts/hidemyname.py", line 21, in get_pages_count
    r = scraper.get(url, headers=headers).text
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/cfscrape/__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/usr/local/lib/python3.9/site-packages/cfscrape/__init__.py", line 290, in solve_challenge
    raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."
SpangleLabs commented 3 years ago

Yeah, it's broken, see #406