Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.34k stars 456 forks source link

ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script. #357

Open 00abCoder opened 4 years ago

00abCoder commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 2.7.12

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/local/lib/python2.7/dist-packages
Requires: requests

Code snippet involved with the issue

import cfscrape
url = "https://techblog.willshouse.com/2012/01/03/most-common-user-agents"
scraper = cfscrape.create_scraper()
content = scraper.get(url).content

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cfscrape/__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/usr/local/lib/python2.7/dist-packages/cfscrape/__init__.py", line 292, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

https://techblog.willshouse.com/2012/01/03/most-common-user-agents

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

00abCoder commented 4 years ago

Changing line 250 of init.py to this solves the problem: challenge, ms = re.search( r"setTimeout(function\s(\s){\s(var " r"\ss,\st,\so,\sp,\sb,\sr,\se,\sa,\sk,\si,\sn,\sg,\sf.+?\r?\n[\s\S]+?a.value\s=.+?)\r?\n" r"(?:[^{<>]},\s*(\d{4,}))?", javascript, flags=re.S ).groups()

Dimitrenko commented 4 years ago

Great works , thank you so much. please Tell me, is it necessary to withstand a pause of 5 seconds between requests?

00abCoder commented 4 years ago

Seems it is not necessary, I run the following code and it's returning the same content on all of them:

import cfscrape
url = "https://techblog.willshouse.com/2012/01/03/most-common-user-agents"
scraper = cfscrape.create_scraper()
contents = []
for i in range(5):
    content = scraper.get(url).content
    contents.append(content)
lord8266 commented 4 years ago

is it necessary to withstand a pause of 5 seconds between requests?

that might depend on the site and how much you request

BruceLee569 commented 4 years ago

Changing line 250 of init.py to this solves the problem: challenge, ms = re.search( r"setTimeout(function\s(\s){\s*(var " r"\s_s,\s_t,\s_o,\s_p,\s_b,\s_r,\s_e,\s_a,\s_k,\s_i,\s_n,\s_g,\sf.+?\r?\n[\s\S]+?a.value\s=.+?)\r?\n" r"(?:[^{<>]},\s(\d{4,}))?", javascript, flags=re.S ).groups()

@00abCoder @Anorov Thanks a lot, it's useful, so I pull a request to master branch : https://github.com/Anorov/cloudflare-scrape/pull/360

Dimitrenko commented 4 years ago

Same problem again ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script. challenge, ms = re.search( r"setTimeout(function\s*(\s*){\s*(var " r"\s_s,\s_t,\s_o,\s_p,\s_b,\s_r,\s_e,\s_a,\s_k,\s_i,\s_n,\s_g,\s_f.+?\r?\n[\s\S]+?a.value\s_=.+?)\r?\n" r"(?:[^{<>]},\s(\d{4,}))?", javascript, flags=re.S ).groups()

does not work any more

iZooGooD commented 3 years ago

I'm facing the same problem. nothing seems to be working

SpangleLabs commented 3 years ago

This project is abandoned, and the lib had broken. See #406