Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.4k stars 462 forks source link

ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script. #369

Closed guidesify closed 4 years ago

guidesify commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.8.2

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/local/lib/python3.8/dist-packages
Requires: requests
Required-by:

Code snippet involved with the issue

    URL = url
    hdr = {'User-Agent': 'Mozilla/5.0'} #bypass 403 forbidden error
    # page = requests.get(URL,headers=hdr)
    # page = requests.get(URL)
    scraper = cfscrape.create_scraper()
    page = scraper.get(URL, headers=hdr).content
    soup = BeautifulSoup(page, 'lxml')

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/cfscrape/__init__.py", line 251, in solve_challenge
    challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "app.py", line 32, in link
    elif app_list == '11': results = get_elementor(url_list)
  File "/home/minitools/decorators.py", line 11, in wrapper
    return function(url, *args, **kwargs)
  File "/home/minitools/api.py", line 66, in get_elementor
    page = scraper.get(URL).content
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/cfscrape/__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/usr/local/lib/python3.8/dist-packages/cfscrape/__init__.py", line 290, in solve_challenge
    raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

https://guidesify.com, https://opinion.guidesify.com This is my site with CloudFlare and using SSL setting: Full (strict) Encrypts end-to-end, but requires a trusted CA or Cloudflare Origin CA certificate on the server. Origin certificate was installed on my server.

All my other sites work and this is the only difference in settings.

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

guidesify commented 4 years ago

Was looking at the Firewall events. It says that the request was challenged because of my firewall rules: Country Unknown states, other entities or organizations

Any idea how I can get past that?

guidesify commented 4 years ago

Error was due to Google Cloud Platform's IP addresses being located in the US regardless of where VM location. The workaround I used was to use a dedicated proxy server in my python script.